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Sets of primers are described for use in the identification, classification or typing of an organism, allele or gene selected from class I HI .A. 
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PRIMERS FOR IDENTIFYING TYPING OR 
CA AftSlFYING NUCLEIC ACIDS 



DNA-sequence analysis is rapidly becoming a standard too! 
15 m modern, molecular biology research. Examples of applications include: 

Sequencing of unknown DNA-sequences, Identifying novel genes in 
stretches of sequenced DNA, Predicting protein-sequence and -structure 
l0 from DNA-sequence alone and Identification of known gene-variations 
(sometimes called "typing a gene ). 

Typing of a gene could be crucial in some applications. For 
instance, organ-donation requires that the "immunological signature" of the 
25 donor matches that of the receiver. This "signature" is mediated by the 

is Human Leucocyte Antigen (HLA) complexes (also known as Major 
Histocompatibility Complex, MHC) on the cell surface, and the 
corresponding genes are among the most varied in the human genome. 
30 Considering the importance of organ donation, the shortage of organ- 

donors and the fact that an organ cannot be stored for any longer time- 
20 periods, a rapid and accurate typing of the HLA-genes is required in order 
35 to make most use of the organs available for transplantations. 

Another application where a rapid and accurate identification 
of a gene is desired is when trying to identify unknown bacteria. A rapid 
identification of the ba, ' ria causing the illness of a patient makes it 
40 25 possible to administer the correct medication early in the treatment of the 

disease, thus reducing the discomfort for the patient. Since every self- 
replicating organism so far studied use ribosomes when translating mRNA 
45 to proteins, analysis of one of the genes coding for the ribosome, for 

instance the 16S rRNA in the case of prokaryotes. could be used to identify 
30 the organism in question. 

There are several ways in which a gene can be identified, 
50 with the conceptually easiest being to sequence the entire gene and then 
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15 



.coking at the result. The main drawback is that this approach is tune- 
consuming, and not easily scaied up using conventional methodology, 
new method. Arrsyed P««er Extension (APEX). lacKs this drawback 
APEX works by immobilising a large number of primers to a sohd surface, 
thus creating a DNA-chip. These primers are constructed to be 
consecutively overlapping over the entire gene of interest, so that every 
base in the gene will have a primerto Ks 5-end. By adding fluorescent* 
,abe..ed dideoxynucleotides, the primers wil. then be extended by one 
nudeotide using the samp.e DNA as temp.ate. It wHI thus be easy to check 
which nudeotide was incorporated, which in turn tells you the entire 
sequence of the sample DNA. 

Since some genes, like the HLA and 16S rRNA. have a large 
number of known variations, a prohibKively large number of primers have to 
be created in order to probe for al, possfcle combinations of vanant 
positions in the gene. Thus the array primer extension method APEXfor 
sequencing would need more than 16.000 primers if a«. DQB alleles 

in pairs should be combined the number of primers might be even h,gher 
wh ich would be the situation for a heterozygote found in most ind,v,dua.s. 

But this might not be necessary, if some variations always or 
never occur together. This needs to be studied though, and a way found to 
determine the least number of primers (and what their sequences are) 
required for unambiguously identifying those genes. 

An object of this invention is to find and implement an efficent 
„ algorithmcapableofdoingjustthat. The algorithm should preferably also 
" take into account the meKng points of the primers, so that the extern 
reason can take place under optima, conditions for ai. of the primers on 
the chip. It .hould a.so minimise the number of "self-extended" pnmers, ,e. 
primers that can extend themselves without any sample DNA. Th.s 
30 algorithm is then to be tested and evaluated on the HLA and 16S rRNA- 
genes HLA is chosen partly because of the importance of rap.d typing of 
these genes, leading to the fact that there are many other methods to 
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which APEX can be compared. It is also because the HLA-genes are 
"easy" to work with, since they rarely contain any insertions or deletions. 
These kinds of variations in the gene could potentially create problems 
when designing primers for APEX. The 16S rRNA, on the other hand. 
5 contains insertions and deletions and can thus be used to see if the 
algorithm can handle such variations. 
15 The invention provides a method of identifying a set of 

extendible primers for use in the identification, typing or classification of a 
nucleic acid of known sequence having known polymorphisms wherein: 
, 0 j) all possible nucleotide seauences of a chosen length of the 

nucleic acid are identified and their corresponding extendible primers, 
jj) at least one extendible primer is removed from the set 

wherein the at least one primer removed identifies a segment of the nucleic 
25 acid identified by at least one other primer. 

Preferably the method includes between step i) and ii): 

ia) potential extensions for each primer are identified with 
respect to each nucleotide sequence, 

ib) for each extendible primer the identified potential extensions 
are compared to determine which pairs of sequences can be discriminated 

20 by the primer. 

Preferably a matrix of primers and pairs of primer extensions 
is prepared in binary form and is subjected to analysis by a set covering 
problem (SCP) algorithm as described in more detail below. 

The invention also includes a set of extendible primers, for 
25 use in the identification, typing or classification of a nucleic acid of known 
sequence having known polymorphisms, identified by the method as 
defined. Preferably the primers are attached by Spends to a surface of a 
45 support on which they are presented in the form of an array. 

In another aspect, the invention provides a set of extendible 
30 primers, for use in the identification, typing or classification of a human 
leucocyte antigen (HLA) gene as indicated, the set comprising about the 
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number of primers indicated and being capable of distinguishing about the 
number of alleles indicated: 



Number of 



HLA gene Number of 

Alleles Primers 



Class I 
Class II 



HLA- A 
HLA-B 
HLA-C 
DPA-1 
DPB-1 
DQA-1 
DQB-1 
DRB-1 
DRB345 



91 

200 

47 

11 

74 

17 

34 

192 

35 



172 

<1000 

S4 

26 

130 

130 

84 

<1000 
94 



5 In another aspect, the invention provides a set of extendible 

primers, for use in the identification, typing or classification of 16S rRNA, 
wherein the set comprises about 210 primers and is capable of 
distinguishing at least about 1207 different sequences. 

In these aspects of the invention, the approximate number of 

,o primers is indicated. As indicated below, it may be possible by the use of 
the algorithms exemplified or other algorithms to generate slightly smaller 
sets of primers capable of distinguishing the number of alleles or 
sequences indicated, and these sets are envisaged according to the 
invention. Of course, other primers may be present in addition to those 

l5 indicated as essential, and may be useful for checking purposes. The 
number of alleles or sequences indicated represents the approximate 
known number of polymorphisms or different sequences, and these will 

surely increase with time. 

In another aspect the invention provides a .nethod of 
zo identification, typing or classification of a nucleic acid of known sequence 
having known polymorphisms, by the use of the set of extendible primers 
as defined, which method comprises applying the nucleic acid or fragments 
thereof to the set of extendible primers under hybridisation conditions and 
effecting template-directed chain extension of extendible primers that have 
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formed hybrids. Preferably template-directed chain extension is effected 
using four different fluorescentiy labelled chain-terminating nucleot.de 
analogues, and results are analysed by an imaging system such as total 
internal reflection fluorescence (TIRF) or scanning confocal microscopy. 
5 The various steps of the method may be performed as described in the 
literature for the known APEX technique. 

In another aspect the invention provides a kit for use in the 
identification, typing or characterisation of a nucleic acid of known 
sequence having known polymorphisms, comprising the set of extend.ble 

io primers as defined. 

In another aspect the invention provides an array of sets of 
extendible primers as defined, forthe simultaneous identification, typing or 
classification of two or more different HLA genes. 

With the present invention it has been realised that where a 
15 number of different alleles are to be identified, the total number of primers 
required to distinguish each of the alleles could be reduced as some 
primers would be common to all of the alleles, for example. Thus, with the 
present invention complete sets of primers for identification of each allele 
are identified and then the total number of primers in the combined sets » 
20 reduced using predetermined rules. 

Furthermore the present invention is based on the prem.se 
that as the primers are used to identify the presence or absence of a 
particular nucleotide sequence in any allele, the specific nucleotide that 
extends any particular primer is of less relevance than simply whether the 
2 , primer has been extended. Thus, the problem of reducing the overall 
number of primers is greatly simplified rendering the problem one surtable 
for treatment as a Set Covering Problem (SCP). 

Embodiments of the present invention will now be described 
by way of example with reference to the accompanying drawings and 

30 examples, in which: 

Figure 1 is a diagram of a signal matrix in accordance with 

the present invention: 
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Figure 2 is a diagram of the corresponding binary matrix for 

the signal matrix of Figure 1; 

Figure 3 is a flow diagram of the steps for reducing the primer 

set in accordance with the present invention. 

The following is an explanation to assist in an understands 
of the principles underlying the manner in which the number of primers 
used in the identification of a plurality of sequences may be reduced. 

Theoretically the number of primers required to identity k 
sequences grows as 0(W. where / is the length of the sequences as each 
sequence requires I primers. However, the less the sequences differfrom 
one another, the fewer prim are required as many of the pnmers 
required for identification of a first sequence may also be of use in 
identification of another sequence. This effect becomes more pronounced 
the greater the number of sequences to be identified and the greater the 
15 similarities. 

Considering an initial set of n primers required in the 
identification of k sequences, a signal matrix of k x n can be constructed. 
Each element in the matrix represents the signal, if any. that is generated 
by a particular primer with respect to a particular sequence. The stgnal wdl 
20 either be one of the four nucleotides TV. 'C\ "G", or T or no signal 
Figure 1 is an example of such a signal matrix where, for example, the 
signal generated by primer 2 with respect to sequence 3 is T. 

The signal matrix is then converted into a binary matnx that 
represents whetherthe signals for any particular primer differ with respect 
2S to different sequences. Thus, again with respect to primer 2. the same 
signal 'G is generated for both sequences 1 and 2 but a different signal • I ' 
is generated with respect to sequence 3. The binary matrix is constructed 
by considering each column (each primer) of the signal matrix and 
comparing each signal in that column in turn. Thus, as shown in Figure 2. 
30 the first row of the matrix represents a comparison of the signals for the first 
and second sequences, the second row represents a comparison of the 
signals for the first and third sequences and the third row represents a 
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comparison of the signals for the second and third sequences. Binary V 
represents the comparison revealing the same signal and binary T 
represents the comparison reveals different signals. In the case of primer 
2. as mentioned earlier the signals for the first and second sequences are 
s the same ('0') whereas the signals for the first and third sequences are 
different (T). This conversion produces a matrix m x n where m=(k(k-1))/2. 
Hence, for large numbers of sequences, 2m grows approximately as the 
square of the number of sequences. Figure 2 shows the binary matrix for 

the signal matrix of Figure 1 . 
I0 As the primers are required to enable the differentiation of 

sequences from one another, the reduction of the signal matrix to a binary 
matrix, representing differences in the signals obtained for different 
sequences, distils that element of information necessary to enable a 
selection of the minimum number of primers necessary to identify the 
,5 individual sequences. From the binary matrix the least number of columns 
are selected such that each row contains at least one non-zero element. 
Thus, if one of the columns contained all Ts only that one column would 
be required. However, in the case of Figure 2, there is no single column 
containing all M's and so two columns must be selected, for example 
20 primers 1 and 2. Primers 1 and 2 together enable each of sequences 1 . 2 
and 3 to be differentiated and so the remaining primers are redundant. 

Where large numbers of sequences and primers are involved, 
the binary matrix renders the data contained within that matrix suitable for 
mathematical analysis. Once the selection of the reduced number of 
25 primers has been made, though, it is the signal matrix that is required 

during the use of the primers in the identification of the different sequences. 
Thus, the signal matrix is used to 'decode' the results of any analysis using 
the reduced number of primers. 

In practice, large numbers of sequences and primers are 
30 involved and the selection of a reduced set of primers cannot be performed 
by simple inspection of the binary matrix. For large numbers of primers, 
selection of a suitable reduced set of primers can be performed by treating 
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the selection as a Set Covering Pro; :..-n (SCP). An SCP is an integer 
optimisation problem and is well known in fields such as airline crew 
scheduling, selecting manufacturing equipmentand ingot mould selection 
in steel production. In such large scale problems that cannot be solved 
5 exactly (NP-hard). heuristics are used in order to gene rate a solution. As a 
SCP is NP-hard, global algorithms and algorithms that identify local optima 

« are not very suitable on their own for a large scale SCP. They will simply 

require far too much computation as they try to find a solution that can be 
proven to be at least locally optimal. For this reason heuristic methods are 

20 10 required instead. They do not claim to *e even locally optimal soluton. 

but are much faster. 

Two known computational methods that have been found to 
be effective in identifying reduced sets of primers are the 'greedy 1 algorithm 
25 and Lagrangian relaxation algorithm. 



40 



45 



15 



20 



rtrflAri y Algorithm 

The most simple heuristic is the greedy algorithm, where 
columns are added one at a time. The column to be added in each step is 
chosen so as to cover as many uncovered rows as possible (a row is 
„ covered if it has at least one non-zero element). In other words, if S r ,s the 
35 set of commas already included in the solution at iteration r, and R r is the 

set of rows with no non-zero elements at Iteration r, column /, b selected 
according to: 

j\ = arg min c y \P } j$ $ r 
Equation 1 



This continues until all rows are covered, or until no more 
columns exist which can cover any of the rows still uncovered. Instead of 
» minimising the term q/P h other terms can be used. Example terms are c„ 
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a/ft*, Pi or Cj/(Pj)2. Greedy algorithms of ths type are described « An 
^tHeuristicfor Large Set Covering Prob ,rn^ Vas k o. W,son. Nava, 
Research Logistics Quarterly 1984. 31:163-171 the contents of wh.ch s 
incorporated herein by reference. The difference is in how much em P has,s 
to place on the cost of the column versus now many rows the column 
covers It is shown, however, that this entire class of heuristics share the 
same worst case behaviour. If we denote the set of columns in the solufon 
as S and the S o.utJon value as 2. then the worst case behaviour can be 
described as: 



10 



where 



Equation 2 



4 l J" 

J-l J 1 iml 



y-i. 

Equation 3 



in other words, how much worse the heuristic solution is 
compared to the optima, solution is dependent on the maximum number of 
non-zero etements in the cdumns. The advantage is that this algonlhm . 
fast even though its time complexity is Ofnto) (there can be a maximum of 
20 m columns in the solution, i.e. the maximum number of iterations is For 
each iteration the matrix is traversed once to find the nc 4 coiumn to be 
added) Altogether, we have that the time required to solve the problem ,n 
the worct case scenario will grow as the number of sequences to the power 
of five (four due to the number of rows, and one due to the number of 
25 columns). In the case of 16S rRNA (see later), where we have -1000 
sequences, the matrix will have -500.000 rows. The number of primers 
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(columns) is in this case -250,000. 



I ag ranaian relaxation 

More sophisticated methods exist, which use other kinds of 

< heuristics. One heuristic capable of generating the most optimal solutions 
is believed to be some kind of Lagrangian relaxation heuristic, where .n 
each iteration the Lagrange multipliers for each column are used to 
calculate the Lagrangian cost for the columns. Such a Lagrang.an 
relaxation heuristic is described in "A Heuristic Method for the Set Covenng 

,. Problem', Capara et al Technical Report OR-9545, Operations Research 
Group, University of Bologna 1995 the content of which is incorporated 
herein by reference. A near optimal vector of these costs is then calculated 
by a subgradient algorithm, before being used as input to a greedy 
algorithm. This is repeated until no improvements in the solution can be 

15 made. 

In Lagrangian subgradient methods the Lagrangian of the 
original problem is considered instead ot the original problem. In this case, 
the Lagrangian will be 

L(u) = min £ c/w)*; + £ 



20 



25 



Equation 4 

where u, is the Lagrangian multiplier for row /. c/u) is the 
Lagrangian cost associated with column /. and is defined by 

A 

("I 

Equation 5 
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An optimal solution to Equation 4 is given by 



0 if Cj(u)>0 
< \itcj{u)<Q 
[o or 1 if c y (w)=0 



Equation 6 

L( u) can also be seen as an estimate of the lower bound for 
tte solution i.e. the sun, of the costs for the coiumns h the optima, solufcon 

. ■ -~*oari but this will require much 
anoDtimal multiplier vectc, u ...^eafl.Dunn.s. m 

anopum h nea r-optimal mult.pl.er vectors 

computation especially for a large bor.Du k 

1 be found within short time by using the salient vector m defined 
by 

,=l " m 
Equation 7 

u can be refined iteratively by using for example 



M ' /MO 



Equation 8 

where A > 0 is a step-size parameter and UB is an upper 
. ,h. SCP first a near-optimal multiplier vector u » found. Th.s and 

.ecter found and so on until convergence is reached. 
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Another alternative computational method that may be 
employed to solve such a SCP is "surrogate relaxation' in which in each 
iteration a corresponding continuous problem is solved and made feasible 
before a sub-gradient algorithm is applied. Alternatively, genetic algorithms 
5 may be employed in which the 'genome' consists of n bits, one bit for each 
of the columns. 

15 It should also be borne in mind that as the SCP operates on 

the binary matrix which only represents differences in signals between 
sequences for the same primer, a primer in the selected reduced set may 
generate a negative, signal rather than a positive signal, A, C, G, T. To 
be sure that the sample does in fact contain a particular sequence it is 
essential to ensure that for each sequence at least one primer generates a 
positive signal. Furthermore, in practice redundancy is desirable as all 
25 reactions may not occur as intended. Therefore, the least number of 

is positive signals as well as the least number of differences in the signal 
pattern is preferably larger than one. 

With reference to Figure 3, the following is a description of 
one method of selecting a reduced set of primers. 

Firstly, all possible primers are selected (10) using the 
20 standard APEX procedure to produce a first set of primers. During this 
35 selection a substring of the sequence to be analysed is used to construct 

one primer, then the substring is displaced by one base and another primer 
is constructed. This process is carried out from the start of the sequence 
until the entire sequ^. ^e has been covered. Both strands of DNA are used 
25 and this is repeated for all sequences. The primers should be long enough 
to be capable of discriminating between exact matches and mismatches 
involving one or two nucleotide pai's. Conveniently, the primers are 13bp 
45 j ong a3 tn j s nas been found to be sufficient to ensure the reaction, or longer 

to increase hybrid stability. However, to avoid steric hindrance on the chip 
30 each primer may be 5-tailed. In this example, twelve *T's are added to the 
5'-end of the primer so that the final length of the primers is 25bp. 

Next all primers that are not suitable as primers are rejected 



30 



40 



50 



55 



WO 00/65088 



-13- 



PCT/EPOO/03636 



(12) and the rest is included in 9 ^;,.iary primer set. Unsuitable primers 
are those where the three bases at the 3-end are complementary to any 
substring of the primer. In some instances this can result in the primer 
being extended by a neighbouring primer and not th^ sample DNA as a 

5 template and for that reason such primers are considered unsuitable. 

Also, any primers that would produce ambiguous signals are 
identified and rejected (14). A primer produces an ambiguous signal where 
it is not known which of the four bases is in the relevant position. 

Each of the remaining primers in the primary set primer is 

10 then compared to each sequence in turn to determine whether the primer is 
extendible by each sequence and if the primer is extendible the base with 
which it would be extended is determined. A signal matrix of the primers 
with respect to each of the sequences is thus generated (16). 

In order for a primer to be extended using the sample DNA as 

15 template, the three bases in the 3'-end of the primer must hybridise to the 
DNA. Otherwise the enzyme responsible for the extension will not be able 
to add a nucleotide to the primer. Of the rest of the primer (the poly-T tail 
excluded), at most two mismatches are allowed, otherwise the primer-DNA 
duplex is considered to be too unstable to be extended. 

20 In ordinary PGR, all the bases must match in order for the 

primer to be extended. But then the temperature is raised to the melting 
point t T mt of the primer in the extension step. In APEX, this reaction is 
carried out at45°C, which is around 10°-20° below T m of most primers. 
This means that the primers will hybridise to the DNA despite a few 

25 mismatches, which is why two mismatches are allowed here. - 

In some cases a primer could hybridise to a sequence in 
more than one position, and sometimes a primer could hybridise to both 
strands of one allele and give different signals. In those cases all the 
different signals are combined to form one resulting signal (e.g. 'A' and 'C 

30 together forms *M\ which is the NC-IUB (NC-IUB, 1 985) code for this 
combination). 

For each column of the signal matrix the entries for each row 
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are compared against one another, in other words for each primer the 
signals produced by the primer for each sequence are compared against 
each other. A binary matrix is thus generated (18) of the primers with 
respect to the identity or difference of signals for pairs of sequences. The 

5 binary matrix contains non-zero entries where the primer is able to 
distinguish between a pair of sequences. 

The number of pairs of senuences that each primer can 
distinguish between are counted and a score is allocated to each primer 
(20) in dependence on the total number of pairs of sequences counted. 

10 Thus, the number of non-zero elements for each primer are counted. 

Primers that are unable to distinguish between any pairs of sequences are 
rejected (22) and the remaining primers are sorted (24) in order of their 
score with the primers with the higher scores at the beginning. 

A core of primers is created next (26). The primer with the 

is highest score is selected. Where two primers with equal scores exist, the 
number of positive signals is determined for each and the primer with the 
greater number of positive signals is chosen. If both primers remain equal, 
one is then selected arbitrarily over the other. After the main primer has 
been selected, the first twenty (five times the desired redundancy which is 

20 four here) primers giving positive signals for each sequence in turn are 
selected for the core. All remaining primers are rejected. 

A greedy algorithm is then run (28) using the core set of 
primers to identify the minimum number of primers necessary to distinguish 
each sequence. As the greedy algorithm is run, primers are added one at 

25 a time with each primer being selected in turn in relation to the number of 
uncovered rows it is capable of covering. When ail rows are covered at 
least four times the reduced set of primers is checked for any sequences 
that has fewer than four positive signals and extra primers are added as 
necessary to meet this minimum requirement. 

30 a redundancy check is then performed (30) to identify 

whether any more primers can be removed. During the redundancy check 
each primer is "tentatively" removed in turn to see whether the remaining 
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primers meet the minimum requirements. 

If not, the next primer is tried. Otherwise the primer is 
temporarily removed from the set, and the process continues with the next 
primer in line. This process continues until no more primers can be 

5 removed, in which case the last primer to be removed is added back to the 
set, and the next primer in line tentatively removed and so on. This can be 
viewed as a depth-first search of a tree where the nodes are combinations 
of primers, and the number of primers in each node is one less than in a 
node one level above. The root node thus contains all primers from the 

10 greedy algorithm. It has p (the number of primers after the greedy 
algorithm) primers in it. It also has p child-nodes (because there are p 
ways in which you can remove one primer from a set of p primers), each 
with p-7 primers. Each of them has p-f children with p-2 primers and so 
on. In this way, all possible combinations of primers in the set fulfilling the 

is requirements are found, and those combinations with the same, least 
number of primers are saved as the final primer sets. 

Instead of applying greedy algorithm to the core set a 
modified algorithm called CFT may be applied. 

20 Laoranqian subgradient 

This algorithm consists of three main phases: A subgradient 

phase where a near-optimal multiplier vector is found, a heuristic phase 

where a solution to the SCP is found and column-fixing, designed to 

improve the results of the heuristic phase. 
25 In the subgradient phase, a near-optimal multiplier vector u is 

found using Equation 8. At the beginning, the starting vector a 0 used is 

defined as 

0 - c j 

U; = mm — - — 

k-i 

Equation 9 
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Later calls use the last vector u before column fixing, and 
apply a small perturbation before using it as the starting vector. The 
perturbation is randomly (and uniformly) distributed in the range ±10% for 

5 each element. The sequence of multiplier vectors is considered to have 
converged when the improvement in L(u) in the last 50 iterations is smaller 
than 0.1%, or when the number of iterations reached 10 x m. The factor X 
in Equation 8 was set to 0.1 at the beginning, and was updated as follows: 
Every 20 iterations, the best and worst lower bounds L(u) during those 20 

io iterations are compared to each other. If the difference is larger than 1%, 
the value of X is halved. If the difference is less than 0.1 %, X is multiplied 
with 1 .5. In the first call, the upper bound, UB, used is the sum of the costs 
of the first primers that together cover all rows four times. Otherwise it is 
the value of the best solution found so far. 

is in the heuristic phase, the last vector from the subgradient 

phase is used to generate a sequence of multiplier vectors (again using 
Equation 8), and a feasible solution constructed for each of the multiplier 
vectors. The procedure used to generate a feasible solution is a variation 
of the greedy algorithm, where each column is scored according to 

Equation 10 



20 



where R is the set of uncovered rows in each step. The 
column with the lowest q t i.e. the columns with the best M gain/cost"-ratio, is 
25 added in each step to the solution. This continues until no improvements to 
the best solution (i.e. minimum number of primers) have been made for 50 
iterations. 
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After the heuristic phase column fixing is applied to the 
solution. Columns that are absolutely necessary in order for a row to be 
covered (i.e. if there are only e columns covering a row and each row is to 
be covered e times) are fixed. These fixed columns are then used as a 
5 starting point for the greedy algorithm, and the first max{[200/mj, 1} 
columns chosen therein are fixed as well. 

These three phases are then applied again to the problem, 
with the condition that the fixed columns must be included in the solution 
this time. Columns already fixed in a previous round can not be removed 
10 from the solution. This goes on until either all rows are covered by the 
fixed columns, or the cost of the fixed columns is larger than the estimated 
lower bound for the entire problem or if no new columns were fixed in the 
last iteration. 

When the three phases are done, the problem is refined, in 
is order to improve the solution. Here, each column in the best solution found 
so far is scored according to 

( i " K — 1 

6 j =maxjc y («') 1 0}+ ^aX-ir- 

Equation 11 

where 

20 J * 

Equation 12 

and S is the set of coLmns in the solution. The term u t {Ki - 1) 
is the contribution of row / to the gap between the estimated lower and 
25 upper bound of the problem. This is then split uniformly between all 
columns in the solution covering that row. Columns with small $ 
(contributing the least to the gap) are then likely to be part of the optimal 
solution. The p columns with the smallest § are then fixed before the entire 



WO 00/65088 



-18- 



PCT/EP00/03636 



algorithm is applied again to the resulting sub-problem. (Column fixing 
here has nothing to do with column fixing after the heuristic phase, so 
columns fixed there need no longer be fixed here), p is the smallest value 
satisfying 



ex m 
Equation 13 



10 



where {j k } is the set of columns in the solution ordered with 
ascending 4. and /, is the set of rows covered by column j. tc is in the range 
0...1 and controls the percentage number of rows removed after fixing, n = 
1 means that no rows will be uncovered, while n = 0 means that no 
columns will be fixed before reapplying the algorithm. (Since each row has 
to be covered multiple times, in this case it is not actually the number of 
rows but the number of elements covering the rows that are regulated by 
15 71). In the beginning, 1c is set to 0.3 and is multiplied with a = 1 .1 if the best 
solution so far was not improved in the last application of the three main 
phases. If a better solution was found, n was reset to 0.3. Because of the 
density of the matrices, the number of columns fixed in this step was also 
set to be at least one more than in the previous iteration (if no 
20 improvements were made). Otherwise the same number of columns would 
be fixed in a number of iterations before the value of * is large enough to 
allow more columns to be fixed. 

The algorithm is iterated until either the value of the best 
solution is less than the estimated lower bound, all columns in the best 
25 solution found so far are already fixed in the refining step or a time limit is 
exceeded. The time limit in this case was arbitrarily set to as many 
seconds as there were rows in the problem. However, the time limit is only 
checked before the refining step. If it is not exceeded, a whole iteration of 
the algorithm will be executed before another check is done. Here too a 
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check was done afterwards to see if primers could be removed without 
breaking any constraints. 

With this algorithm no pricing is performed. Pricing is used to 
update the core problem, exchanging columns between the core problem 

5 and columns outside the core. It was not included here since it was argued 
that since the costs of the columns are all the same, the best columns 
would be those with the largest number of non-zero elements. These 
would be the first columns to be added to the core, and the columns not 
included in the core would most probably not be better than those included. 

10 Also, the pricing step will require soma computation which will extend the 
time required by this algorithm. As is, the computational requirement of this 
algorithm is several orders of magnitudes higher than for the greedy 
algorithm- Finally, the main memory available in the computer puts a limit 
on the how large the problems can be. If pricing was included all data will 

15 not fit into the physical memory, forcing the computer to use a swap-file 
which would increase the computation times considerably. 

Using both alternative algorithms described above a minimum 
number of primers were identified for various sequences. The results are 
set out below. 

20 u will be apparent that the initial manual rejection of primers, 

steps (12, 14 and 22) need not be performed and instead the algorithms 
can be applied to the original complete set of primers. However, the initial 
rejection of obvious failed primer candidates can significantly reduce the 
computational time required in the later stages. Similarly, in many cases 

25 the final redundancy check (30) need not be performed as in many cases 
little or no reduction in the number of primers was achieved by this final 
check. 

Furthermore, although in the method described above the 
primers were initially sorted in order of score, this need not be performed. 
30 The algorithms for stripping out redundant primers are capable of operating 
with any order of primers including a wholly random order. However, 
slightly better results were obtained when ordering by score was 
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performed. 

Collecting sequences 

The HLA-sequences were available internally from 

5 Amersharn Pharmacia Biotech (release December 1997), and included 91 
alleles from HLA A 202 HLA-B, 47 HLA-C, 1 1 HLA-DPA1 (coding for the 
a-chain), 74 HLA-DPB1 (p-chain), 18 HLA-DQA1. 34 HLA-DQB1, 192 HLA- 
DR1 and 35 sequences in all of HLA-DR3 t -DR4 and -DR5. The length of 
these sequences range from ~250bp to -1 100bp. 

io The 16S rRNA-sequences were collected from GenBank 

(Benson et a/., 1998), an annotated database of all publicly available DNA 
sequences. Only a subset of all the available 16S rRNA-sequences were 
used. The sequences used were ail from organisms that could be 
identified using either the MicroLog or the MicroStation system from Biolog 

15 Inc., or the API systems from Counterpart Diagnostics. These systems 
utilise differences in metabolism in order to identify the organisms, which is 
the most common way of identifying micro-organisms today. Altogether, 
1207 sequences from 523 different organisms were collected from 
GenBank. 269 of those 523 organisms had only one 16S rRNA sequence 

20 among those 1207 sequences. The length of these sequences is between 
~1000bp and ~1500bp. 



45 



Data set 


No. sequences 


Mean length of sequences 


DPA1 


11 


517 


DPB1 


74 


288 


DQA1 


17 


616 


OQB1 


34 


490 


DRB1 


192 


324 


~^B345 


35 


400 


HLA-A 


91 


944 


HLA-B 


200 


900 


HLA-C 


47 


1003 


16S rRNA 


1207 


1452 



Table 1: Details about data sets. 



55 



WO 00/65088 



PCT/EP00/03636 



•21 - 



10 



15 



20 



25 



30 



35 



40 



45 



50 



10 



15 



20 



The program was written using the Microsoft® Visual C++®, 
version 5.0 compiler. It was executed on a PC with a Pentium® MMX 233 
MHz processor, 64 MB RAM and Windows® 95, unless otherwise 
indicated. All execution times are for the entire program, including I/O. 

As can be seen in Table 2, the binary SCP matrices were 
quite dense. The density (i.e. the number of non-zero elements in the 
matrix) usually lies around a few percent, of course depending on the 
application. A higher density means that fewer columns are needed in 
order to cover ail rows. This is offset in this case by the fact that all rows 
were required to be covered multiple times. Another consequence of this 
high density is that the number of primers needed according to the greedy 
algorithm could be much higher than in the optimal solution. (Recall that 
the worst case behaviour of the greedy algorithm is a function of the largest 
column-sum of elements.) 



Dataset DPA1 DPB1 QQM QQfil QBM PRB345 HLA-A HLA-B \&±C 16S rRN A 
No. rows 55 2701 136 561 18336 595 4095 19900 1031 727821 
Density <%) 47.89 20.73 36.31 42.18 24.98 37.70 36 .31 32.33 30.41 2,04 



Table 2: Some details about the binary SCP matrix. Data are 
calculated for all primers in the primary set. 

The program could be considered as consisting of two 
phases. The first phase involves constructing all primers and finding out 
what kind of signal they will get for each sequence. The second phase is 
the optimisation phase, were the SCP is solved. Some details about the 
first phase can be found in Table 3. 



Dataset DPA1 QEhl COM DQB1 tiLA£ HIA-B HIA£ 1 6$ rRNA 

Fifstwt 1747 1885 2487 2891 3891 3031 4756 4994 4293 247877 

PriirarvKt 1333 1475 2166 2730 3651 3016 3886 4585 3354 247877 

Core set 106 321 213 244 385 203 595 750 338 2377 

Tlmgfs) 4 67 6.81 11.26 18.51 42.29 14.56 124.74 286.82 61.29 150632 



25 



Table 3: Number of primers in different stages of the algorithm and time to 
get signals for all primers. The number of primers in the core are for 
homozygptes. 
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One explanation to this high density is that the sequences in 
the data sets are quite similar to each other, so that most primers will 
hybridise to and give signal for more than one sequence (either the same 
or different signals). This is also indicated in Table 3, where for some data 
5 sets there is a noticeable drop from the number of primers in the first set to 
the number of primers in the primary set. Most of this reduction is due to a 
primer having the same signal for all sequences, which in turn means that 
all sequences have a substring that is similar enough for the primer to 
hybridise to and that the nucleotide after the primer is the same for all 
io sequences. In contrast, the 16S rRNA data set has a much lower density, 
and no reduction in the p;;.,.o.s going from the first set of primers to the 
primary set. As the sequences in this data set come from organisms which 
might be only distantly related to each other, there need not be as much 
25 similarity between the sequences as there is in the HLA data sets. Another 

explanation is this: If all k sequences except one give the same signal for a 
primer, that column in the binary SCP-rnatrix will have k-1 non-zero 
elements. The density (for that column) will then be (k-1) f {k(k-1)/2) = 
In other words, the density will be higher for smaller values of k, and 
smaller for larger values. This means that it would be -natural" for smaller 
matrices to have higher densities, and larger matrices to have lower 
densities. 

In the second phase, solving the SCP, a few different 
approaches were tried. The results, the minimum number of primers 
needed and the time required to find this number, can be found in Table 4 
and Table 5. Even though the worst case behaviour of the greedy 
algorithm is not so good in this application, the results are not much worse 
than wh-n using a Lagrangian subgradient (CFT) method. The greedy 
algorithm typically needs two or three more primers, while the computation 
times are much lower for the greedy algorithm. 

The results show that it is worthwhile to check the results 
from the greedy algorithm for redundancy. In all cases except one primers 
could be removed and the resulting primer sets still fulfil ail requirements. 
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This is not true for the CFT algorithm, however, as there is only one 
instance in which the result could be improved. On the other hand, since 
there is some randomness in the CFT algorithm (an old multiplier vector is 
disturbed randomly before being used as a starting vector in the next 
iteration), the results can differ from one execution of the algorithm to 
another. Sometimes the results can be improved, and sometimes not 
(results not shown). 



Dataset DEM DQM QQM OEM GBBM5 HLA^ tiLAS HLA£ l§SrRfiA 

Greedy 11 42 32 31 48 24 73 103 51 210 

Tima(s) 0.27 1.37 0.61 0.71 11.5 0.66 4.61 31.36 1.15 9921.48- 

Final 11 41 30 29 44 21 72 99 47 107* 

Total l&\ 0.27 1-81 0.72 0.88 30.3 0.71 6.4 * BS.14 1.76 >300000* 



Table 4: No. of primers after the greedy algorithm and time 
spent by it Also final nr. of primers after check for redundancy and the total 
time spent solving the SCP. 'Value from a 300MHz Pentium II with 512MB 
RAM running Windows NT 4.0. A The computation was halted before 
completion due to time constraints. 



Dataset 

CFT 
Time {«) 

Final 
Total (s) 



DPA1 DPB1 PQA1 PQB1 DRB345 HIA-A HL&£ 



10 38 26 27 

10.22 2748.92 60.80 372.56 

10 38 26 27 

1ft 99 2749.14 60.86 372,61 



20 69 47 

427.32 4547.33 1091.37 

20 69 45 

427.38 4548.49 1111.70 



15 



Table 5: Results using modified algorithm CFT. 
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One reason CFT is not much better than the greedy algorithm 
could be that it was designed for other instances of SCP. The SCP arising 
in this application differ in three aspects from those: A) The density is much 
20 higher, B) All rows are to be covered multiple times and C) The costs of all 

columns are all the same. 

A comparison was made between the results from the greedy 
algorithm and from CFT in Table 6. Most of the primers (70% or more) 
were chosen by both algorithms, indicating that these primers are likely to 
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be part of an optimal solution. H<_ ..over, this is only an indication as the 
only way to prove this is to find an optimal solution. This will require far too 
much time even for the smallest data set as the problem is NP-hard. 



Dataset 


DPA1 


DPB1 DQA1 DQB1 


DRB345 


HLA-A 


HLA-C 


Greedy 


11 


41 30 29 


2. 


72 


47 


CFT 


10 


38 26 27 


20 


69 


48 


Same 


7 


33 22 22 


14 


62 


38 


Percent (%> 


70.00 


86.84 84.62 81.48 


70.00 


89.86 


80.85 



5 Table 6: Comparison of primers fnjm the two different 

algorithms. 

Results from combining HLA sequences in order to 
differentiate between heterozygous individuals can be found in Table 7. 

10 CFT was only used for the two smallest data sets due to the time re- 
quirements. It performed slightly better than the greedy algorithm on those, 
but only by one primer on each data set. There are heterozygotes that can 
not be distinguished from another heterozygote, which can be seen in 
Table 7. This happens because the combination of two sequences to form 

15 one heterozygote could result in exactly the same signal pattern as another 
combination of homozygotes. In other words, some rows in the signal- 
matrix will be the same leading to some rows in the binary SCP-matrix not 
containing any non-zero elements at all. For some of those pairs listed, 
this is not true, however. They are listed because there were not enough 

20 primers that have different signals for these pairs, and so could not meet 
the requirement of at least four different signals in the signal patterns 
(Table 8). For the rest, it is simply a limitation of this technique to type 
HLA-genes. To be able to identify the alleles forming each heterozygote, 
primers that amplify alleles selectively should be used in the PCR step. 

25 This will remove the ambiguities as some heterozygotes simply will be 
transformed to homozygotes since only one of the alleles in the 
heterozygote will be amplified and not the other. 
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Dataset 


PPA1 


DPB1 


DQA1 


DQB1 


DRB345 


HLA-A 


HLA.-C 


Greedy 


26 


130 


51 


81 


94 


172 


94 


Time (s) 


0.99 


9229.57 


7.41 


294.51 


453.19 


20826.20" 


1212.59 


CFT 


25 




50 










Time (s) 


1943.82 




8427.82 










Amb. het 


0 


16 


2 


2 


6 


19 


4 


Percent (%) 


0.00 


0.58 


1.31 


0.34 


0.95 


0.45 


0.35 



Table 7: Results from heterozygous pairs. Number of primers 
needed, the time spent, how many heterozygotes that did not differ by at 
5 least four signals from any other heterozygote and the percentage of total 
number of heterozygotes. 'Value from a 300MHz Pentium II with 512MB 
RAM running Windows NT 4.0. 

Unfortunately, it was not possible to obtain any results for 
10 heterozygotes for the data sets DRB1 and HLA-B. as these were too large 
to run on existing machines. A very approximate extrapolation of the 
primers needed for these data sets suggests that the total number of 
primers for all HLA sets together would be <1000, which can placed on one 
chip without problem (one chip can contain up to -5000 primers). Without 
15 the reduction obtained above, at most two genes could be tested on each 
chip. With the reduction, all nine HLA genes and the 16S rRNA gene can 
be tested on one chip, and with plenty of room to spare for other genes as 
well. This makes APEX more versatile, as it allows a family of related 
genes to be tested using only one chip instead of several. 
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fair 1 A*0214 A'0222 
Na.d.H. 0 

P.lr 1 A'02>2 A'JIOl 
p l)r i A'0227 A'2(0B 
H«.OI« 2 

f.»< A'**" 

p.lrl A'24QT A'2501 

No. ««<• 0 

P11M A-2402 A-M012 

Pair l A*34fl T A'680J1 

Ha. 4M1. 0 

paid A*250i A*«012 

Pair 2 A'2S«2 A*M8»1 

"7 ■«■" 0 ■ 



Table 8: Heterozygous pairs that do not differ enough in their signal 
patterns, and how many signals they differ with. 

The results of this work are summarised in the following 
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Class 1 


Number of 
alleles 


Primers 
needed 


Class II 


Number of 
alleles 


Primers 
needed 


HLA-A 


91 


172 


DPA1 


11 


26 


HLA-B 


200 


<1000 


DPB1 


74 


130 


HLA-C 


47 


94 


DQA1 


17 


51 








DQB1 


34 


84 








DRB1 


192 


<1000 








DRB345 


35 


94 



15 



Table 9. Number of primers needed to discriminate between 
heterozygote HLA samples. 

Some sets of primers indicated in Table 9, and also the set 
indicated for 16S rRNA. are set out in appendix 2. 

Primers can be arranged on the surface of a support in such 
a way that different studied types, genes, alleles, species etc. form easily 
recognised characters such as figures or letters. These character form.ng 
primers can be additional primers of common origin from the gene of 
interest and be used for validation of the process. 

The following demonstration is based on the HLA Class II 

DQB gene- 
Experimental 

Materials 



45 



50 



Amplification: 

zo DNA: Four homozygote for DQB cell lines, with alleles 0402, 0301 . 0601 1 
and 0201. 

Primers: Primer DQB 9246 from Williams et al. -96 and DQB 96012 from 
Amersham Pharmacia Biotech HLA DQB typing kit. covering exon 2. 
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generating a fragment of 300 base pairs. 

Amplification reagents: PCR mix from the Amersham Pharmacia Biotech 
HLA DQB typing kit, a prototype kit. 

All amplifications were spiked with dUTP, to get a final concentration of 100 
or 200 mM dUTP. 

Enzymes for fragmentation of PCR products: 
Shrimp alkaline phosphatase (SAP)1 U/ul APB. 
Uracil-DNA-glycosylase. (if from PE UDG = UNG) 1 l%l NE Biolabs. 

SAP will degrade (dephosphorylate) all free dNTPs and UDG 
will remove all dU from the DNA and after heating the strands will be 
broken at these points. This step is applicable to any DNA fragment. 



is PHmPfR for spotting: 

All 84 primers for the 500 bp fragment were ordered from 
LTl/GIBCO BRL Custom primers service. All were 25-mers with an amino- 
activatedS'-snd. For primer sequences see appendix 1 . Self extended 
primers were N, A, C. G and T as controls with the following sequences: 

20 N: amino TTT AGC CTT AAC GCC T N TGAC GTCA 

A, C.G. T: amino TTT AGC CTT AAC GCC T X TGAC GTCA, where X is 
A, C, G or T. 

Extension reageu. for the APEX reaction 
25 Dves: Specially synthesised for Baylor by Du Pont and /or APB 

Cy2 - ddCTP (equal to fluorescein) 50 
Cy3-ddATP 50 nM 

Texas Red -ddGTP 50 jiM 

Cy5 - ddUTP (often written as T in many of the reactions and 

30 results) 50 ^ M 

10x ThermoSequenase™ DNA polymerase buffer (TS): 
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260 rnM Tris-HCI pH 9.5; 65 mM MgCI 2 , ThermoSequenase DNA 
polymerase (Amersham Pharmacia Biotech) 4 U/pl. if needed dilute with 
T.S. dilution buffer (=10 mM Tris-HCI pH 8.0; 1 mM p-mercaptoethanoi, 
0.5% Tween - 20(v/v), 0.5% Nonidet P-40 (v/v). 1 S was used from a 150 
5 unit stock and diluted 1 ^tl + 37 \i\ dilution buffer. 

Methods 

Preparation of glass slides before spotting of primer: 

Arrange 25-30 cover slips (24 x 60 mm) in a stainless staining 

10 tray. 

Immerse the tray in glass staining dish with acetone to fully 

immerse slides. 

Place the glass staining dish in sonicatorfor 10 minutes. 
Remove the tray from acetone bath, shake of excess of 
is acetone and rinse several times (at least twice) in MilliQ water. 

Immerse tray in 100 mM NaOH and sonicate for 10 minutes 
(a few more minutes, no problem). 

Remove the tray and shake of excess of NaOH and rinse 
several times (at least twice) in MilliQ water. 
20 Immerse tray in silane solution and sonicate for 2 minutes. 

Wash slides by immersion in 100% EtOH once. 
Dry the tray with the slides using nitrogen with a high velocity 
(without breaking the slides). 

Cure the slides in a vacuum oven at 100°C over night or until 
25 they are used for spotting (at least 20 minutes vacuum is needed). 

S potting of oliqos: 

All spotting was done with a spotter with 96 parallel capacity. 
Each slide was spotted with three replicas of the primers. 
30 After spotting the slides were allowed to air dry for 5 to 15 

minutes, when dried they were marked. They were stored at room 
temperature, in a dry place, in the trays until used. 
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riOB amplification 

The DQB amplification was done according to the method 
described by Williams et al. -96 using a 33% dUTP mix. After 40 cycles 
(95°C. 30 sec; 55°C, 30 sec; 72°C, 30 sec), one microliter of the PCR 
products was tested on a 1 .5% agarose gel, before the fragmentation step. 

Williams. Bassinger, Moehlenkamp. Wu. Montoya. Griffith. 
McAuley. Goldman. Maurer: Strategy for distinguishing a new DQB1 allele 
(DQB1*061 1) from the closely related DQB1*0602 allele Tissue Antigens. 
1996.48:143-147. 
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Fra gmentation of PC R products; 

Before APEX can be done all DNA fragments must be 
fragmented so all new fragments can get access to the primer on the chip. 

15 Set up: 

5 |i| DNA from a PCR reaction (1/10 of the PCR reaction) 
2 mI SAP (Shrimp alkaline phosphatase) 1U/pl APB 
1 pi UDG (Uracil-DNA-glycosylase) 1U/ul NE Biolabs 
15 jjl! water 
20 Total: 23 \i\ 

Incubate 37°C for 2 hour. 

The samples were frozen and stored until they were used. 

Inactivation of enzymes at 100*C for 10 minutes can be done, 
but not needed since this is the first step in the APEX reaction. 
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Extension method for the APEX reaction 

Slide treatment: 

Start with washing the slides in hot water (90 - 98°C, not 
boiling) for 2 x 5 minutes in a 50 ml Flacon tube. When the slides are 
ready, remove them from the tube with a forceps and place them on a dry 
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heater block at 48°C. The slide(=DNA chip) is now ready for adding the 
reactions. 

APFX reactions set up: 



23 |al DNA from the fragmentation step. 

3 jj 10x TS reaction buffer (the rest of the buffer comes from PCR and 
UDG cleavage) 
17 (il for cover slip method. 
20 lQ Heat denature at 100°C for 7-10 minutes, target 8 minutes, not longer. 

Spin the tube quickly and add quickly 
1 id ThermoSequenase DNA polymerase (4U) 
1 |i1 Dye-mix (50 |iM of the four dideoxynucleotides A, C. G, and T, 

separately dye labelled). 

Then the reaction mix was physically spread out over the 
primer array with the tip of a pipette tip. Incubate at 48°C until no trace of 
solution is seen. This takes about 8 minutes. 

Wash with hot water for 2 - 5 minutes, 2 times. Ready to 
read on detection instrument. 
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Detection 

The detection system is a total internal reflection fluorescence 
(TIRF) system, where microscopic slides are placed on top of a prism with 
40 oil on to link a laser beam in to the glass slide. The system has light of five 

different wave lengths from five different lasers to vary between. In this 
experiment only four were used. To detect Cy2 a laser with 488 nm was 
used, for Cy3 a 532 nm, for Cy5 a 635 nm and for Texas Red a 670 nm 
laser were used. Image related software were based on Image Pro Plus 
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Results 

Am plification " f & nQR alleles 

The DNA from the four DQB homozygote cell lines were 
amplified according to the protocol in Williams ef al. -96 with two different 
concentrations of dUTP. In addition to this. DNA from six different 
heterozygotes were amplified. All amplifications worked well and the 
expected 300 bp fragment were seen from all samples. 

APEX react""" with DQB cnip 

Primer chips were washed and fragmented PCR products 
were incubated on the chip according to the protocol. The image was 
compared to the expected pattern. The expected pattern was similar to but 
somewhat different from the recorded pattern, the reason for this is that the 
set up was planned for a 500 bp fragment, but the actual fragment used 
was a 300 bp PCR fragment. 



HnmnT ygous cell lines results 

Figure 4 shows the results from a cell line homozygous for 
20 the DQB 0204 allele. The pattern shown in the image is very close or 

similar to the expected results from exon 2. 

In all reaction the control primers worked well and the four 

dyes were used in the same frequencies. In the case with a 500 bp 

fragment for DQB typing the primers for allele 0402 were placed in such a 
25 way that they formed figures. In Figure 4, panel D, most signals are seen 

forming a T from the 300 bp fragment, and the missing signal will be seen 

when the large PCR fragment is used. This clearly shows that primers can 

be placed in a clever way to form figures. 

30 Heterozygo us results 

For the heterozygous test only one of the four dye reactions 
worked. Some of the expected spots from the heterozygous sample were 
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not seen, but this is probably due to the fact that no control signals were 
seen in the lower right hand corner, where the signals were weaker then .n 

10 other part of the slide. 

As this experiment shows, a limited number of primers can be 
5 used for HLA typing and if they are placed in a dever way the interpretation 
of the results is very simple. Both homozygous and heterozygous samples 
15 can be correctly analysed with this method. 



20 10 



25 



Continuation 

An algorithm was developed in order to select the minimum 
number of primers needed to identify different genes using APEX. It was 
applied to the following HLA genes: HLA-A, HLA-B. HLA-C, HLA-DPA1, 
HLA-DPB1 HLA-DQA1, HLA-DQB1, HLA-DRB1 and HLA-DRB345. It was 
also applied to the 16S rRNA gene. In the case of HLA-DQB1, the primers 
have been shown to work as intended. As is. a few assumptions were 
made (such as how many mismatches to be allowed between the primers 
and the sample DNA) that need to be tested and possibly refined. 

Another Improvement that can be made is the following: As is, 
the program works only with discrete signals, e.g. either there is a signal TV 
or there is not, either there is a signal 'G' or there is not and soon. A more 
precise approach would be to predict how strong the signals will be for 
each primer on each sequence. A rough estimate of the signal strength 
should be possible given some thermodynamic data about the primers, 
most notably their ,„elting point? . With this information, and knowing the 
25 concentration of DNA in the sample among other things, the proportion of 
primers on the chip that will actually react with the sample DNA should be 
possible to estimate. It would C.js allowa rough estimation of what 
<« strength the different signals will have. It will not be very precise, and the 

estimate might possibly be off by a factor 2 or more, but it will still give 
30 some information about what signals to expect from the chip. 

Given the melting points of the primers, the temperature at 
which the reaction on the chip is carried out could be optimised as well. 
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Since the sequences are known, it is possible to estimate the melting port 
of any primer to any sequence when there are a few mismatches. This 
could be done for all primers on all sequences, and a range of 
temperatures calculated. The actual temperature to use could then be 
chosen so as to be as optimal for as many primers on as many sequences 
as possible, instead of as now at a standard temperature. 

Another possibility would be to try other heuristics to solve the 
resulting SCP. Even though CFT does give better results than the greedy 
algorithm, it is not by much. It could be that Lagrangian relaxation methods 
really are not suitable for unicost problems, but the only way to find out .s to 
try heuristics based on other ideas. It might be possible to reduce the 
binary SCP-matrix as well, before applying any heuristic on it. Some rows 
in the matrix could end up the same, in which case one of them could be 
removed in order to reduce the number of rows and thus speed up 
f computation. No figures of how many rows might be the same exist, but t 
could be worthwhile examining this possibility to reduce problem size. 

The algorithm itself could be improved. The complexity of the 
redundancy-check phase can be slightly reduced by having a vector 
consisting of the sums of the rows in each node. For each child-node, the 
o column to be removed is then subtracted from this vector of sums. Th,s 
operation can be carried out in 0(m), and the final complexity will then be 
0(m x N(p. p)) instead. For the greedy algorithm, another possible 
improvement is to check the primer set for redundancy each time a primer 
was added. The complexity for the greedy algorithm will be the same, as 
2S the check will take 0(m xp) (i.e. same as each iteration in the greedy 
algorithm) each time (with the improvement just mentioned). The check 
could take longer, but that is unlikely as that would imply that one pnmer 
could make several other primers redundant. The main advantage ». of 
course, that no redundancy check with its rather high complexity is needed 
30 afterwards. 

The most serious problem is the sheer size of the problems. 
For the 16S rRNA data set, around 300 MB is required just in order to store 
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a„ the primers and their signals. Add to that the fact the a.l primers need to 
be traversed once for every iteration in the greedy algorfthm. and the result 
is that it will take quite some time as well. This also means that it » not 
even feasible to use more elaborate algorithms such as the CFT algonthm 
< on the 1 6S rRNA data set. unless a much more powerful computer is 
available. On the other hand, a.gorithm CFT would probably benefit qu,te a 
tot from a parallel computer, since much computation could be carried out 
as vector-operation, U should then be possible to spread out all 
computations on several processors, thus reducing the time required. It 
would also reduce the memory requirements on each processor (but then 
parallel computers tend to have enough memory to store al. necessary data 
for'hisproblemoneachprocessoranyway). Even the greedy algonthm 
would benefit from a parallel computer, as each processor can be charged 
with the task of scoring only a subset of primers, n is not as critica. m th,s 
case, though, since the computation times are not very high when us,ng the 

greedy algorithm. 

As is this method is only capable of identifying known gene- 
variants. If applied to a sample with a previously unknown variant, it is very 
probable that this new variant will be falsely identified as one of the known 
variants. It would be very advantageous if this method could be 
augmented in some way to recogn.se this fact, and give a warning if there 
could be an unknown variant in the sample. It could be done by givng , a 
waning when the signal pattern gained differs from the signa. pattern from 
any known variants, but this might not be enough. There is no guarantee 
that the new variant could not differ in some place not affecting any of the 
existing primers, which would lead to the new variant be.ng 
indistinguishable from any of the known variants. Some other way .s 
probably needed as well. 
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APPEND1X 1 

Primer sequences for DBQ hetercaygote typing 
Pn£m "dqbl -V to •dqbl -8' traced in 9°**™* £1°- 
pE 'dqbl -9' to 'dQbl -18' placed .r. 1 . 

Primers 'dqbl -19' to 'dqbl -3ff placed In pos dons C -C 2. 
Primers 'dqbl -3V to 'dqbl -42' placed .n pos loons 01*12. 
S -dqbl -4* to dqbl -54' placed in positions E1-E12. 
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Primers 'dqbl -5S to 'dqbl -66" placed in positons F1 £12. 
10 Primes 'dqbl -67' to 'dqbl -76' placed in posrtons G2-G 11. 
15 10 p"^ "dobl -7T to 'dqbl -84' placed in posrtrans H3-H10. 

dob1-7 NH2 - GTG GTG ACG CCG CTG GGG CCG ■ CCT 
?ui4 o kS« TCC GTC AAA GGA GTC AG A AAG GGC T 
20 dqbH-8 NH2 "JCC GT^ W w« AC C CCG CAC G 
dqb1-9 NH2 - GAT GTA TCT GGT CAC 

iH!i;iS|fiic 

dqbl-29 NH2 - TCC GTC CCA T "j> £ CGA G GA AGA T 

iilllllli 
Iplllllss 

" " llllilllli 

SsfflBBisssisasf' 
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ffi S ™ : CAA COG GAC ^ °CG OGT GCG GGG T 
dqbl -51 NH2 - ACA TCT ATA ACC GAG .AM i ACT ACCT C 

10 5 dqbl -52 NH2 • GAA CAG CCA ^GA CAT CCT GGAG 

Sqb1-53 NH2-CCTTCT03CTATTCCAGT ACTOGGC 
dnbl -54 NH2 - TTA AGG CCA TGT GCT ACT "CACCAA 
8 -11 NH2 - TTC AGA TTG AGC CCG CCA CTC CAC 5 G 
dqb1-56 NH2 - ATC TGG TCA CAAGACGCA CGC GCT C 
,0 dqb1-57 NH2 - ACT AGC ACA GGC CCT _TAA ACT GGT A 
^ki *fl KIM2 - ATG TAT CTG GTC ACA CCC CGC AOL> A 

« 8 IS NH2 - ATC TGG TCA CAT AAC : GCA CGC« 3CT C 

j uw? ATC AAA GTC CAG TGG M CGG AAT G 

8?4f W2 - ACG TOG GGG TGT ATC GGG TGG TGA C 
.< ^h J?NH2 ATC AM GTC CGG TGG M CGG AAT G 

dqb1-64 NH2 - CGC TGT CGA AGC GCACGT CCTCCT C 

Hnhi-fiQ NH2 - GAA GTA GCA CAG GCC CTT AAA CTG G 
£b 1o N H2 - GAA GTAGCA CAT GGC CTT AAA CTG G 
S W1 NH2 2 - T^ACA GCG ACG TGG GGG TGT ACC G 

8w^.^^CA0GCCGM^0WGGGT 
Sqbl-75 N H2 - TCG ACAGCG ACG TGG AGG TGT ACC G 

30 81*78 NH2 ATG GGA CGG AGC GCG TGC GTT ATG T 
* 8 K SS - GGG TSt CGC CGC TGG GGC GGC TTG A 

dqbl-80 NH2 - ACG <^^^ r C °?^Q^r 
amjh who . TGA TAA GGC CAA GCC CAA mv*h i 

<Jqb1-83 NH2 . CGT CGC TGT CGA AGC GCA CGT CCT C 
dqbl-84 NH2 - GAC TCT CCC GAG GAT TTC GTG TAC C 



35 

40 APPENOIX 2 



PCT/EPOO/03636 



Homozvaotes 

40 (From CFT if available, otherwise greedy algorithm). 



DPA1 

45 TTTTTT 



45 



50 



TTTTTT 

TTTTTi 
TTTTTT 



TTTTTT 
50 TTTTTT 



TTTTTT 
iTTTT 



TTTTTT 
TTTTTT 



TTTTTGCCCAGGGCACAG 



TTTTTAAGGAAAAGGCTC 



TTTTTTGGATCTGGACAA 



TTTTTCTGGCCCAGCTCC 



TTTTTTTGTACAGACCCA 
TTTTTAGGGGACCCTGTG 



TTTTTGGCGGACCATGTG 
TTTTTCTG CTCATCTTC A 



TTTTTGTCAACTTATGCC 



TTTTTT CAGGCCGCCAAT 
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DPB1 

IIHIIIIIIH CAACCGGGAGGAG 
1 1 I I I 1 t I I I I IGGCCTGACGAGGA 

I I I I I I I I I I 1 TCAACCTGGAGGAG 
5 TTTTTTTTTTTTTCCAGTACTCCTC 

M 1 I I I I I I 1 I I I GCCGTAACTGGT 
U I IIHI I MI I GGGGCGGCCTGA 

I I I i I H I I It I GCGCGTACTCCTC 
| 1 I I I I I I I I I I I GGACAGGAGGAA 

\ o 1 H I I I I I I I I I CACAGGAGGAGCA 
\ \ n I I I I t M I I I GCTCCTCCTGT 

I 1 I II I I I I I I rGGCAATGCCCGCT 
T 1 1 1 I I I I I I I I GGCACTGCCCGCT 
1 1 1 1 I I I I M 1 rAGAGAATTACGTG 

15 1 1 1 I H I 1 1 I H rCCAGAGAATTAC 

I I i I H I 1 1 1 1 I'AACTACG AG GTGG 
TTTTTTTTTTTTGGTCATGGGCCCG 
I I I 1 1 1 I 1 1 1 I I f GACCCTGCAGCG 

I n | | | M | 11 1 TA CACGTAATTCT 
20 | I 1 1 I I I I I I I I GTAACTGG1 MCAC 

H 1 1 H I I I I I T CTGACGAGGAGTA 
T 1 I I I I I t I I I i 1 TACCTTTTCCAG 

II 1 I 11 I I I U I CCTGGAAAAGGTA 

I 11 1 I I I 1 1 1 1 1 GAGAATTACCTTT 
25 I I I I I I I 1 1 I I I 'GCCTGACGAGGAG 

1 1 1 1 1 1 I I I I I IA CTGGTGCACGTA 
MINIMUM I CCTCCAGGATGT 
TTTTTTTTTTTTCGGGAGGAGCTCG 
T M 1 1 1 I I I I I I AG CCAGAAG GAC A 
30 TTTTTTTTTTTTCAGCCAGAAGGAC 
1 1 1 1 M M M I 1 AGTGCCQGACAGG 
T 1 1 I M I II I I I ATTGCCGGACAGG 
nMI I IM I U CCTGCAGCGCCGA 

I I M 1 1 I M I I I AG AGAATTACCTT 
35 1 1 1 I I I I M I I I GGACTCGGCGGTG 

1 1 I I I II I I I I f ACTACGAGCTGGG 
1 1 I 1 1 1 1 1 I I I rGCTTCGTGCTGGG 
1 I 1 I I I M M I I GTCCCTGGTACAC 
I I 1 1 1 1 M I I 1 1 GCGCTGCAGGGTC 

40 OQA1 

I 1 1 1 II I M I 1 1 ACATCCTCATCTG 
I I I I 1 1 I I M I I AC ACCCTCATCTG 
M I I M I I II T TCAAGTTTACACCA 
M I I 1 1 I I I I I I CAGCCACAATGTC 



H M I I II I I I IA ATTCATGGCTGT 
Ml l llUin I A CAATCCCAGGGC 
IM I IIHUM ACAACCCCAGGGC 
50 I l I I I I M I I I I GTGGGCATTGTGG 
M I Ml M I I I r CCAACACCCTCAT 
M M I I I U I I r GGCCCACAGACAA 
M I I 1 1 I I I I I TCATG GGC ATTGTG 

II M II I I H T TGGCCTGGATGAGC 
55 TTTTTTTTTTTTAGGCTCATCCAGG 

I 1 M I I 1 1 I I I I CAACACCCTCATT 

I II I M 1 1 M I I AGCACTGGGGACT 
I I 1 M I 1 1 1 II I AAGGGCC ATTGTG 



45 




TCCAAGTCTCCCG 
"CGGGAGACTTGGA 
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T I I I I I I I I I I I AAATTCATGGGTG 

I I I 1 I I I I I I I I CACCATAAGAGGC 

I I I 1 I I 1 I I 1 I I 'CACCACAAGAGGC 
\ I U I III I I I I CACCGTAAGAGGC 

1Q 5 1 I I I I I I I I II I I ICCTCCCTTCTG 

M I I I I I I I 1 11 I A ACTCTCCTCAG 

I n | | | | | | I I I I A AATGTC ATCAG 

I I 1 I 1 I I I I I I 1 CTCCTCCCTTCTG 

DQB1 

10 I 1 1 I I 1 1 I 1 I I I ATCTTGCAGAGGA 
15 | I I I I 1 I I I I I 1 CCTCTCCAGGATG 

1 I I M I I I I I I I GGGTCACCGCCCG 
I I I I I I I I I I I I GGGAGTTCCGGGC 
M I 1 I I I 1 I I I r CGCTCGGGTCCTC 

15 1 ) I I I I II 111 r CCAGTACTCGGCG 

I I I I I I I I I II I CTGGGGCCGCCTG 

m il ium i atgtctacacctg 

20 T I I 1 I I I I I M I A AAGGGCTTCTGC 

I 1 1 I I I I I I I 1 I AGCATCACCAGGA 
20 I I I I M I I I I 1 i GCCAGGAGGAGAC 
1 11 I ill I I I I I ACCAGGAGG AGAC 
1 1 1 I I I I I I I I I GGTTTCGGAATGA 

I | | M I I I U I I GGGTGTATCGGGT 
25 1 1 I I I I I I I 1 1 I GTCGGAAAGGGCT 

25 III I I I I I H III GGTTTCGGAATG 

I I 1 1 I I I I I I I I CCAGTACTCGGCA 

I U I I N || I I I A GCGCACGATCTC 
1 1 1 1 1 1 I I 1 1 I i 'GTCTCTTCCTGGT 

I I I I I I 1 1 Ml I CGTCAAG CCGCCC 
30 1 1 1 I I I I I I I I T GCGTCAAGCCGCC 

30 1 I U I 1 1 I I I 1 1 CAAGGTCGTGCGG 

| | I I U I I I I N CGGTTATAGATGT 

I I 1 I I I I I I I 1 1 I GTAACCAGACAC 

I I 1 I I 1 I I I 1 1 1 GTATGCAGACACA 
35 1 1 I 1 I I I 1 1 I I ICACACCCCGCACG 

1 1 1 1 I I I I I I I I ACACCCCGCACGC 

35 DRB1 

min i mi i gcaagtcctcctc 

TTTTTTTTTTTTTTCTCCTCCCGGT 
40 II I I I I I I I I I rC CACAACCCGGTA 
TT 1 1 1 1 1 1 1 1 I TGGCCAGGTGGACA 
1 1 I I I I 1 1 I I I I 'GCGGTTCCTGGAG 
40 I I I I I I I I il l I CAGCCAGAAGGAC 

I 1 1 I I I 1 1 I I I TGACTCGCCTCTGC 
45 | I I I I I H t I I I I C C AGGACTCGGC 
I 1 1 I I I I III 1 1 GAAATAACACTCA 

I I HI I I I I I I I I GGAGGACAGGCG 

I I I I I I I 1 1 1 I I ACGTGGTCGGGTG 
1 1 I I I I I I I I I I IACTCCAAGAAAC 

45 5 0 | | | | | | I I I . I I A CGGTGTCCACCT 

I I I I I I I I I I I I GGAGAGGTTTACA 
I I I I I I I 1 1 I 1 1 CCAGTACTCGGCA 
M l I I I I I I I I I GGAGTACTCTACG 
H 1 1 I II I I I I I GTGTAAACCTCTC 
55 I I I I I I 1 1 I I I I CGGTGCAGCGGCG 
_ I I I I I I 1 1 I I I I GGAGGAGTTCCTG 

W l ) | | | | 1 1 I I | I I GGAAGACGAGCG 

I I I I I I I I I I I \ CAGGAGGTTGTGG 
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H i n | | | | I I I GACAGGCGCGCCG 
TTTTTTTTTTTTC CGTTC AG GAACC 
I H M I I I I I 1 1 GGAATCCTCTTGG 
I 1 I I I I I I I I I TGCCACAAGAAACG 

5 1 I I I I I I I I I I I ACGTTTCTTGGAG 
1 1 1 1 I I I I I I I r CGGACTCCTCTTG 
I 1 1 I 1 1 1 1 1 1 1 1 I ACGGGTG AGTGT 
1 1 1 I 1 1 1 1 1 1 1 1 CCAGGAGGAGTTC 
TTTTTTTTTTTTGTAATTGTCCACC 

10 M I I I 1 1 I I I 1 1 I CGTAGCGCGCGT 
1 11 | | | | | | I I 1 AAGATGCATCT AT 
TTTTTTTTTTTTTACGTCTGAGTGT 
I I I 1 I 1 1 1 1 I I rCCAGTACTCAGCA 
I 1 1 I I I I I I I I I CGTAGCGCGCGTA 

15 TTTTTTTTTTTTATCTCTCCACAAC 
I 1 1 H I I 1 > 1 1 1 GAGCTCCTCCTGG 
1 | | | | I 1 1 I I I I AACCAGGAGGAGT 
1 1 1 H I I 1 1 I 1 1 AGGGCCCGCCTGT 
1 1 I 1 1 I I 1 I I I I GGAGAGCTTCACA 

20 1 H 1 I I II I H I GGAGAGATTCACA 
TT I I I I I I I I I I ICACCGCCCGGTA 
T M M I I I I I I I A ACTACCGGGTTG 
TTTTTTTTTTTTCC AGTACTGG GCA 

DRB34S 

25 I I 1 1 I 1 1 I I I I I GTATCTGTCCAGG 
1 1 1 1 I 1 1 I 1 1 1 1 GACTGGGGTGGTG 
1 1 1 I I M I 1 1 I I CTGTCGAAGCGCA 
TTTTTTTTTTTTGTGTAAACCTCTC 

I 1 1 I I 1 1 I II I T CTGTGAAQCTCTC 
30 I II I I 1 1 I H r rCACCAGGGCCCGC 

I I Ml I I I 1 1 I IGGCCAGGTGGACA 
TTTTTTTTTTTTGCGGTTCCTGGAG 
I II I I I I 1 1 I M I CGAAGCGCGCGT 
M M 1 1 I 1 1 I 1 1 I A ACCAGG AGG AG 

35 I 1 I I 1 1 I I I I 1 1 ACGTGGTCGGGTG 
I I I 1 1 I 1 1 1 I I I A GGGCCCGCCTGT 

I M 1 1 1 1 M 1 1 I GGGCCCGCCTGTC 

I I I I 1 1 I I I I 1 I A ACTACGGAGTTG 

I I I I I I I II U 1 GGGGCCGGGCTGT 
40 1 1 I I I I 1 1 1 1 I I GACCATGTTTCTT 

i 1 I I I I 1 1 M 1 | CTGTGCAGGAACC 
IIIIIHIIHI GGCCGGGCTGTTC 

I I I 1 11 1 1 1 1 1 1 ACATCCTGGAAGA 

I I I 1 I 1 1 I I I i I CTCACGAGTCCTG 

45 HLA-A 

I I I I I I I I I I I 1 1 CAGTCTGTGAGT 

I I 1 1 I 1 1 1 1 1 1 1 A GACGCATATGAC 
H 1 1 I I I I I I I I G GACGCATATGAC 

I II |l 1 1 | I I I I GGTCGCCAGGTCC 
50 I 1 I I H 1 1 1 1 I TCCGCAGGCTCTCT 

II M I I 1 I I I 1 1 1 CCTCCTCCACAT 

1 I I I 1 1 I I 1 1 I rCCGAACCCTCGTC 
I | I 1 I I I I I I I I ATTTCTCCACATC 
TTTTTTTTTTTTGGCGGACATGGCG 
55 1 1 11 I I I I I I I I CCAGAGCGAGGAC 
H I IUHIIII I I CACCACATCCG 
t U I I I I I I I I I GGGAGCCTGCCCA 
HI i 1 1 M | | || 1 GATGTGGAGGAG 
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1 I I I I I I \\ II I GGAGGAGGAACAG 
T M I II I I I I \ TAGTCATATGCGTC 
T 1 I I I I I I I I I I GGTCTGCCCGAGC 
IIIH UIII 'l A MCCTGCCy TGJ 
1 I HUIIII 1TCCGGGACACGGAA 
i ll 1)1 1 I I I I r CGTCCTGGGGGGG 
XTTTTTTTTTTTCCGCTGCCAGGTC 
11 1 l | | I II I I IATGGGTCCTGGGG 
H I I 1 I H I " rA TGCGTCTTGGGG 
- [ 1 1 1 t I H I I I T GGAGAAGAGATAC 
M 1 I I I M I I I TGGGAGCCCGCCCA 



1 1 1 | I I I I I I I TCCGCAGGTTCTCT 
1 1 1 I I 11 1 1 i I TGCGCAGGTCCTCT 
THIIIIII | I r GGGCGGGCTCTCA 
15 imlllllll 1 CCAGGACACGGAG 
TTTTTTTTTTTTCCGGCAGTGGAGA 
I 1 1 I I M I I I I I AGGAGACAGGGAA 
! I II 1 1 1 I II I TG TCAATCTGTG AG 

I 1 1 I 1 1 I I I I I T AGAAGTGGGTGGC 
20 TTTTTTTTTTrTCAGGTAGGCTCTC 

TI M H I I I I TTCGGACGCCCCCAA 
millllll I TTCAATCTGTGAGT 
T 1 M I 1 1 1 1 1 I T TGAAGGCCCAGTC 
1 1 1 I 1 I 1 1 I I I TCGTCGTAAGCGTC 

25 TTTTTTTTTTTTAACCAGAGCGAGG 
IMIMIIIU [1 GACGGTCATGGC 
IIIUir GGACCTGGCGAC 
MHIIIUU rGAGAGCCCGCCCA 
Til I I Ml r TTTTCATATTCCGTGT 

30 1 I II I Ml n I rGGGAGACACGGAA 

I I I M I I H I I TGTCCACTCGGTCA 
T il 1 1 I 1 1 M I r CCGTGTCTCCCCG 
mill HI IT TGCTGCCACGTGGG 
n i lll l lll I TCGAACTGCGTGTC 

35 I I 1 1 I I I I I I I I GGTAGGCTCTCAA 
Tit I 1 I I 1 1 1 1 l AGGTCCACTCGGT 

I I I I I H I I I I I GTCCTGGGGGGGT 
TTTTTTTTTTTTGCTGCTCMCCGC 
nilllllll IT GGGGCGCCATGAC 

40 TTTrTTTTTmGCGCGATCCGCAG 
IIIHIHHI rGCACATGGCAGGT 
T 1 I I I II M T T T AGGAGAAGAGATA 
H 1 1 I I I II r fTAGGAGCAG AGATA 

mil i um i ccactccacgcac 

45 im i l ll inr C CCGTCCACGCAC 
T l lll l lll.l rCACGTGCCATCCA 
T u I I M I I II r CCCGGCCCGGCAG 
1 1 1 M I 1 1 I I I I CACGTCGCAGCCA 
TTTTTTTTTTTTACGTCGCAGCCAT 

50 1 1 1 1 I I I I I 1 I I ACGTGGCAGCGAT 

TT Ttmrr^A TCC*^ 

nniHUI i r cGAGCTCCGTGTC 
I I I I I I I I I I I I AC CAG AG CG AG GA 
1T1I1 I1 I M rTATGAACAGCACGC 
55 Til Ml I I I I I TTCACACCCTCCAG 
H i I I I ITCTACGTG GACAAC 
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HLA-B 

I IHIU1IU I GGATGGCGCCCCG 
H II IIH I HI CGGCTCAGATCTC 
| | | HI | I I I I I I CGGGGCGCCGTG 
5 nnin il lll CTCCACTGCTCCG 
| 1 1 | 1 1 I I i I I I I GTGTTGGTCTTG 
i 1 | | I I I I I I V TGGGTATGACCAGT 
TT I I M I H 1 1 I I CCAGGTG ATGTA 
11 1 I I I I I I I T TGTCCTGCTCCGCC 
\ o T ! 1 1 1 I 1 1 I I I T TGTAGTAGCGGAG 
TT H I I I Hi r TGCTCAGGTCCTCC 
1 I I I H I I I 1 I IACCAACACACAGA 
T 11 I 1 I 1 1 I 1 1 r CCGTCGTAGGCGT 
1 I 1 1 I I I I I I I I GTGAGCCTGCGGA 
15 j i H | | I I I I I I AC ATCATCC AG AG 
T i | | | | H I I \ I G GTTCTCTCGGTA 

I I I I 11 I 1 1 1 1 TTGATGTGTCTCTC 
20 TT M I I I I I I IT GCGCCATGACCAG 

TT1 I 1 1 H I I I I GGCGTCCTGGTCA 
20 | l | | | | I H 1 1 I AGGAGGACCTGAG 
T U II I I I t i l I GCGCCAGGCACAG 
f t I I I I I I 1 1 I I A GGAGGGGCCGGA 

I I I 1 1 I I t I I I I CCGCTGCTCCGCC 
T TI I I I I I H * I A CACCATCCAG AG 

25 25 1 HIIIIHIII CACACAGATCTAC 

I H M I I 1 1 I I I GGGCATGACCAGT 
H I I I I 1 1 1 M rCACACAGATCTCC 

1 1 1 1 1 1 1 1 I I 1 1 GCGAGTGCGTGGA 
|| M | 1 1 1 1 1 1 1 r GGTACCCGCGGA 
30 1 11 1 1 1 I U I I I CCTGTGCGTGGAG 
30 1 1 1 1 1 1 1 1 1 1 1 IA GAGACAGATCTT 
Tl 1 II I II I I I f CAGCGACGCCACG 
HI IH rCGGGCCGGGACAC 

I I I I H I I I U I CCCGTCCCAATAC 
35 Tl 1 1 M I I 1 1 I T GGGCATAACCAGT 

IMIIIIIIII I GCCCCGCTTCATC 
Ti l II IH li I rCAGGAGCGCAGGT 
35 TTTTTTTTTTTTCGTCCACGCACAG 
TT I I I I H I I I IGAGTCCGAGAGAG 
40 TTTTTTTTTTTTGACACAGATCTCC 
| | | | I I I I I M I I AACCAGTTAGCC 
T 1 1 1 1 M 1 1 1 1 M AGGCGTGCTGGT 

I I I I I I 1 1 I 1 1 r GACCCTGCTCCGC 

I I I 1 1 I II I I IT GGGGCTCCGCAGA 
40 45 H I I 111 I i I I TCCGGTCCCAATAC 

U 1 1 U 1 1 I I I T GCGGGTCACGGCG 
nil 1 1 1 1 I I* I A GGGCCAGGGCTC 
TTTTTTTTTTTTATCCTCTGGAGGG 
| 1 I i I 1 1 1 1 I I I GGCAGACGATGTA 
50 1 I I I I I H I I I IA GGCGGAGC.AGGA 
45 i nill l lUH CAGCTGCTCCGCC 

I 1 1 i n | | | | | 1 ATCTGCGGAGCCA 
n 1 I I I I 1 1 I I I CGGAGCTGTGGTC 
T 11 1I II UIH CGACCACAGCTCC 
55 1 1 1 1 I I I I I 1 1 1 GAAGAGTTCAGGT 

I I I I I I I I I I I I CATGTCGCAGCCA 

I I I I I I 1 1 I I I I CTGGGCTGGCTCC 
} | l M I I I I I I I CAACACAGAGACT 
HHHMinil GGCGGAGCAGGA 
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TTTTTTTTTTTTTATGACCAGGACG 
1M 1 I \\ I I II I CCACTGCTCCGCC 
1 1 [ I I I I I t I I I A TGACCAGG ACGC 
T'T 1 I I I I I I I I " rGGAGGGGCCGGAG 
5 mn m i l l I IGCGTGGACGGGC 
TT1 UI1IUII AGATCTGTATCTC 
T l I M I I I I 1 1 I'GCGGGTCATGGCG 

I I II I I I I I I I rCCGGGACATGGCG 
T i I I I 1 I I I I I r CCACAGCTGTCCA 

in TT U I U I I H rCGGGACATGGCGG 
T1 l I I II I I I I TCCCGTCCACGCAC 
TT I I U I I H I I GAAGTGGGAGCCG 

I I I II I I I HI I 1 1 CCCAATCCACC 
1 1 I M I I I I H I CCCACGATGGGCA 

1 5 T I I 1 1 M I I I M 1 I CCCAGTCCAGC 
TT 1 1 I I I I I I I rGAGATCTGAGCCG 
1 1 1 1 1 1 I I I I 1 1 I CCACGCACTCGC 
M I I 1 1 I I I I I r GACAGCGACGCCA 
■ni l l imiH CGCCGCGGACACC 

20 HIIIIIMII I GTAGGAGGAmoMj 
IT I 1 I I I I 1 I I I CTTTTCCACCTGA 
1 1 1 1 I I I I I I 1 1 CACGTCGCAGCCA 

h i hhhiu caggtcgcagcca 
i hi 1 1 i i i i i i cgtagcccactgc 

25 1 I 1 1 I I 1 1 I 1 1 I A TCCAGGTGATGT 
UmHIINII CCGAATCCACCG 
U 1 1 HI I I I I I GGGCGCTTCCTCC 
1 1 1 1 1 1 1 1 1 1 1 ICCCGCTTCATCGC 

I | | I II I HI I T CCCCGCTTCATCG 
30 1 1 1 I I I I M I I \ CACAC AGACTTAC 

H I I 1 1 1 1 I I I i'AGGACGGTTCGGG 
TT! I I I I I I I I TCCCCGAACCGTCC 
T l 11 1 1 I I H rTGAGCTCTTCCTCC 
1 1 1 1 I I I I I I ITGCTCCCGAGAGCA 
35 T I I I I I M II I I ACTCCATGAGGCA 
T U I IUIII rTGCTGTGGTGGTGC 

I I It I I I 1 1 1 M IT GTCCAGAAGGC 
H 1 1 1 I I 1 1 II I rGCCCGCGGAGGA 
TT 1 1 I Ml 1 I IT GCCGCGGACAAGG 

40 | 1 1 | 11 , 1 1 m CCGCCTTGTCCGC 
| 1 1 1 1 I 1 1 I I rr CGGGTACCACCAG 

HLA-C 

Ti l IH I II HU GAGCTGGGAGCC 
I 1 I I 1 1 I 1 1 I TTGGTGCAGGGCTCC 
45 T 1 1 1 II 1 1 I 1 1 I GGGTGCAGGGCTC 
TT 1 I I 1 M 1 I I I GAGGCGGAGCAGC 
IT I I I I I I W ITA CGGCGGAGCAGC 
Tl 1 I I I I 1 1 H T GCGGCGGAGCAGC 
I H 1 U I 1 1 I TTAGCGCGCGGAACC 
50 T T 1 I I I 1 1 I I I TCGGCCCAGGTCTC 
TTT I 1 H I M I 1 1 GGCTCCCAGCTC 
TTTTTTTTTTTTGCGCGCGGAACCC 
II 11 1 II I I II I A CGGCTTCCATCT 
TTTTTTTTTTTTGGTTCGGGGCTCC 
55 1 1 II I II I I I I I' A CTCGACGCACAG 
H 1 1 H I II I I [ TGGAGCAGGAGGG 
TTTTTTTTTTTTGCGCGCAGAACCC 
TTTTTTTTTTTTTGAGTCTCTCATC 
UIHI I IHH CCTGCAGCCCCTC 
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| | | | II | I H I I CCGCCGTGTCCGC 
TTTTTTTTTTTTCCGCTGTGTCCGC 

I n M l II II 1 1 I CCAGAATATGTA 

mi l ium i cggggagccccgc 
,n 5 ttttttttttttgccgtcgtaggcg 

M l I I I I 1 I I I iccgccaggcacag 
i 1 1 i I I I n I I I gcgccaggcacag 

I I I t I I 1 I I I I i gtagccgcgcagg 
1 1 i n i m i i i gctggacgcagcc 

i o i u i iiiim i i iccagtggatgta 
I I II I II It I I I ic cacgcacaggc 
15 m i n i mi i gccgtgtccgcag 

I I I I 1 1 I I I M i gaggggagccccg 
m il ium i 'cgtgtcccggcct 

i s m 1 1 1 1 1 n < i ggcatgaccagtt 
t n I I I I I i i i v ggtatgaccagtt 

I I I I I I I I III I gacaaccaggaca 
n i i i iii i i i igaatatgtatggc 

20 I \ \ | | I I I II I i gacagccaggaca 

20 7 M I I HI I I I r CTGGCTGTCCTGG 
M I I I I I I I I I I CTCCTAGGACAGC 

II 1 1 I I I 1 I I I rAGGGCCAGGGCTC 
1 1 I I I I I I I I I I I A TAACCAGTTCG 

I 1 11 II I I I H I CATAGGAGGAAGA 
25 25 TTTTTTTTTTTTTGTGGAGACCAGG 

T 1 1 | I 1 I 1 I I 1 I 1 GCTCTTCTCCAG 

I I I I I I I I 1 1 I I G AAGAATGGGAAG 

I I I I 1 I I I Hi I 1 GCGGAAACTGCG 

16S rRNA 

30 1 1 1 1 1 1 1 1 1 1 1 1 I A GCCGCCTGCGT 
30 H 1 1 I M I I I I I GGCCGCAAGGCTG 

1 1 1 I 1 1 I I 1 1 VT GAACTGCCGTTGA 

I H I 1 1 1 1 I H I A GACTGCCGCTGA 
1 1 1 I I I I I I I I I I I A TTCGGAATTA 

-*5 TTTTTTTTTTTTTTGCACCCCTTGT 

I I I ) I I I I 1 1 I I CGCGAGGTTGAGC 
„- 1 1 I 1 I I 1 1 I I I IT ACCCCCCATTGT 
35 I U 1 11 M l I rTCATTTGATACTGG 

1 1 | | | II 1 1 1 I rGTGTGCCTAATAC 
40 M 1 | I I I 1 1 11 I I ACGACTTAACCC 
Tl 1 1 I 1 I I I I 1 1 CCCGGCCTTTGTA 

I M I I I I 1 1 1 1 I GGGCAAACTGGAG 

I I | I I I I 1 I I I I GA^"^" GATCCTGG 
40 MM 1 1 II I IH' GAC , CCCGAAGG 

45 1 1 I I 1 I M I I I I G AAGTCGTAGCAA 
1 M I I I H M 1 CGCTGCAGAGATG 
i i 1 1 1 1 1 1 I 1 1 I TA CCCTACCTACT 
1 H 1 1 I I I I 11 I GAGGACCTTCGGG 

I | 1 1 1 | 1 II 1 I I AAGGGCCATTACC 
50 1 1 1 1 I I I I 1 I I 'TGATAAACGCTGGC 

45 J} i | | l | 1 I I I I GACTAGCTACTCC 

I I I I I I I i I 1 1 I ' ACATCCGGTGTTA 
1 1 I I I 1 1 1 1 1 1 I A TCGCAGGCCTTG 
M 1 1 I I I IM H rCACCAAGTCGCT 

55 1 I I I 1 1 I I I 1 1 I TCCCTCCTTTCGG 
1 1 1 I 1 1 1 1 I I 1 I 11 I AAACGCTGGC 
II | | i | | | I 1 1 I CGAAACCGCAAGG 
50 i mi i u l lll GCAAGCGTCCTCC 

n M I I I 1 I I I I A CCAAGGACGTTT 
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| H I I 11 I I I I I CTAATACCCGGAG 
| 1 1 I I I I I I I I IACTTTCAGTGGGG 
UIIHIMUI CTGCGTGAAGTCG 
1 l | | 1 1 I I I I I I AATAGCCCACCAA 
5 1 I I I I 1 1 I U I I AACGGAAACGGGG 

I 1 1 I I I I I I 1 1' rGGATTGCACTCTG 
II I I I I I I I U I T AGCCTTGGGGAG 
II HMIIIH I CGCCGCATGGCTG 

I I M I I I I I 1 1 I GCATAAGGGGCAT 

i o m i nium iaccacatctctg 

T I I I I I I I M I I GTTACCGCGAGGA 
1 1 ti l lit I I I I GGCTTTCAGAGAT 
T 11 1 1 1 I 1 I I I 1 CGCTGCTTCGCTG 

I 1 1 1 I I I I I 1 I I I AGCGCTACCTTG 
1 5 l I I I 1 I I 1 1 I I 1GCA CCACC TGTCA 

II I H I I I I I I I t GAGTTTTAACCT 
1 1 | I 1 1 1 H t I 1 CTAATACGGGATA 
I I 1 1 1 1 1 1 1 1 1 1 AGGAGAAAGCTTG 

I I I I 1 1 1 1 H I I I IA AGAGATTAGC 
20 1 I I I I I I t I H I 'GTAGCATTCTGAT 

I I t I I I t I 1 1 I I AGGCTTTCCCCCA 
1 1 I I I I I I I II I AGAAGTAGCTTGC 
I II II I M I I IMC GCGTATCATCG 
1 1 I I I I I I 1 1 1 I I TC AGAGATTAGC 

25 T1 1 1 1 I I I I M I TCCGAAAGCGTGG 
1 1 1 1 1 1 I 1 1 I 1 1 1 ACAACCCGAAGC 
I I I I H 1 1 1 M I r GTCATGGCTCAG 
1 II I 1 1 I I 1 I I I CGTAGGCTTGGTG 

II HI N I I I i 1 GTGGAATTCCACG 
30 I II I M 1 1 HI I A CGGTTCCCGAAG 

I I I H I 1 1 I I I I A ACTCGAGTGCGT 

I I I 1 1 1 1 1 1 1 1 1 I G ATGTGCTATTA 

I I 11 I 1 I I M I I AAGCAGGGAGGAA 

I I I 1 11 1 I I II I CTGCTGCAGTGAA 
35 1 I I ! I I I I I I 1 I I I GGGATTAGCTC 

T 1 I I I I 11 1 I I 1 CCTTTGATACTGG 

II I H I 1 1 1 I I I GGACGCTAGCGGC 
I I II I I I I I 1 1 i GTTTACTACCCAC 

I 1 1 I 1 1 1 1 I I I I CGCGATCTCTAGC 
40 I 1 1 II I I I I I I t lAGGCCGTrCCCC 

I 1 1 1 1 1 1 1 1 1 1 1 A CGCGTTGCATCG 

I I 11 1 I 1 1 I I I I GCCCGTCAAGCCA 
71 1 M I I I 1 1 I I A GTCCCCGCCATT 

I I I I I I 1 1 I I 1 1 CTAGCCGTAAGGG 
45 II M I I I I I I 1 1 IGTCCTTCGGGGG 

1 I I I 1 1 1 1 1 1 1 1 AACCAACTCCCAT 
T I 1 I I III I I I I ACTGTGGGTAATA 

I I I I I I I t l I I I CTGAAAGATGGCG 

I I I I I I I I I M 1 CGAAAGCCAGGGG 
50 | I I II I I 11 I I 1 GTCCGGAATTCTG 
TTTTTTTTTTTTC AG AAGTG GG TAG 
I I 1I IIIII I II1 CAGTCCTCATGG 

I | II I 11 1 1 I 1 I GAAAGAAGCTTGC 
| I I I I I I I 1 I I I GACCACCTGTCAC 

55 I H I UI I IIIIIU GGAACTGCAT 

II 1 I 1 1 I I 11 I I ACAGTTCCCGAAG 
T I I | I I I t I II I CTCATATCTCTAC 
H I I I I I I I I I I T TCAGTGAGGAAG 
I 1 1 1 I I U I I 1 1 ACTGTGAGGAAGG 

60 1 I 1 I I t I I I t 1 1 CCCAGCCCGTAAG 
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TTTTTTTTTTTTCGTAGCCTTGGTG 
I I H I H I II I I ATGATGCGTAGCC 
T 1 It 1 1 I I I 1 1 t 'AGGCAGTGGCTCA 
I MHIIIUI I CAGGACTTAACCC 
5 TTTTTTTTTTTTGGCCAGGCCGTAA 
10 i | | | I I I U I nTCCAACTTCGTGC 

I l I I I I I I I I ITGAAGCGTGTGTGA 
llllllllHI I CTCCCCCGAAGGT 
T 1 1 H II I I I I r ATGGGAGTTTGTT 

10 II I I 1 I I I m I ' GTGTGCCGTTACC 

II I 1 I I I I I I I T AGCAGTGAGGAAT 
15 1 1 1 1 H 1 1 I 1 1 I 'GCCCCGGTTAACT 

7TTTTTTTTTTTG C ACCG GCAGTCA 
| | n | II I H I t GGACCTTCCTCTC 

1 5 I I I II I I 1 1 I I I 'ACCTAGGTGGGAT 
1 1 1 1 I I i I I I 1 I AATAGCTAATACC 
U I H I IIUII GCCATATCTCTAC 
1 1 1 1 I H 1 1 I 1 I GCCGGTGGGGTAA 
20 1 II 1 I I I I N I I f ACCCCACCTTCG 

20 11 1 1 I II 1 1 I I T CAAGGCCTGGGAA 
1 1 1 1 1 1 1 1 I I 1 1 CAACCCTGGTGGC 
1 1 I II I I H 11 I CTAGTCATCCAGT 
m ill IH III GGCTGCTGCCTCC 

I I llll I II I I I CCCAGAGCTCAAC 
25 TTTTTTTTTTTTGAAAGCTTGATCC 

25 Till IH llll I AACACGCTGGCAA 

I I I I 1 1 I 1 1 1 I rGAGCTTGCTCCCC 
1 1 | 1 1 1 1 1 1 1 1 1 ATTTAGTTGAGCA 
11 ! 1 1 1 I 1 1 1 1 1 CGACTTAGGCTCA 
U 1 1 1 1 1 1 1 1 1 1 T TGATGTGCTATT 

1 1 1 1 I li 1 I I I C TTAGGTGCCAGC 
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U HI 1 1 I I I I 1 ' G GCTACAGATCGT 
1 1 1 1 1 1 1 1 1 1 i I A ACTTGCGTGCAT 
1 1 H I i I I Hi 1 GCGATTACGTCAA 
35 I I I II 1 1 I 1 1 I I GGACGTTGGCGGC 
71 1 II 1 1 1 1 1 1 1 I GGTGGAGGATGT 
1 1 I I I M I I I 1 1 A TAAACCATGCGG 
lll lll l lHI I A AGAAGTGGGTAG 
35 ll llllllHI I AACAAGCTAATCC 

40 1 1 I I M 1 1 1 I I I ICCATGGTTTGAC 
- 1 1 1 1 1 1 1 1 1 1 1 TA GTAACTGCCGGT 
| | I I 1 1 1 1 1 1 I I GAAAAGGGGGCGT 
| | 1 1 1 I 1 1 1 1 1 1 GGCGCTTGCGCTC 
I I 1 1 1 I 1 1 1 1 I TGCTACCTACGTGC 
45 I II I II I I H I I rGCGAGGTGGAGC 
40 U 1 I I Ml I I I I CGCGAGGTGGAGC 

1 1 1 1 1 I 1 1 1 1 1 T GCTACCTACTTCT 
[ || | 1 I I I I I I I 1 1 A ACACATACAA 

I 1 1 1 | | 1 1 1 1 1 1 T G TTGTGAAATGT 
50 Ti l 1 1 H II I 1 1 CGTAAAACTCAAA 

1 ! ! 1 ! I II 1 1 I IC AAGGGGCAAGT 
45 iM IIIIIIIIM CCMCCTTGCGG 

TT 1 1 1 I I I H I I GGAGGAACGTGGG 
| 1 I 1 1 I 1 1 1 I I I A TAAGCCTCTGAG 
5 5 1 1 1 I 1 1 I 1 I I I I I ATGCTAATCCCA 
IT I 11 I I I 1 1 I 1 GATGCTAATCCCA 

II I H I I I I I I I GCCAGTGTTCGTC 
1 11 I I I I I I I I I GTAAAGGTGGGGA 

50 Till HUH IMI AACACACCGCC 

60 T1 1 I I I I I W I I CCAAGGCGGTGAT 
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T l I 1 I I H I I U GCTACGGCTAACT 
I [ I I 1 1 1 I I I 1 I AGTCGAGCACTCT 
1 1 I I 1 1 I I I I I I AAGGGTAGCTAAT 
1 in | n U 1 1 T GTCACAGTACGAG 
5 1 I H I I I I 1 I I T TGAAAGCACTTTA 
1 1 i I I I I I M I I GGCGCAAGGCTTA 

I I I I I I I I I I I I GCCTAGGTGGGAT 
HM I MH I tl GTCCCCACGTTCC 

I I I I 1 1 I I M I I GGCCACAAGGGGA 
10 TTTTTTTTTTTTCTAGCTGTAGGGA 

1 1 | | | I I It I I IGTGGGCAGCAAGC 
HI | I I I I I H I I CG AAAG ATTAAA 

I I 1 1 I I I 1 1 I M GGAGTATGGTCGC 
T l 1 1 I I I I I H I CGAGATGTGAAAG 

15 1 1 1 1 I I M I I M GGGCAGGCTAGAG 

I H 1 I H I I I I I ACCTCCTG AGCCA 
XTTTTTTTTTTTTCCACCGCTACAC 

I I I H 1 H I I I Hi I CAGTCTTGCG 
Til H I II H I I CTTGACGGGCGGT 

20 I II I I H I H I I ACGGTAAAAGATG 
jXTTTTTTTTTTTTCACCCTTGCGG 

I I I I I i I I I I I I I AACCAGAAAGCC 
MINIUM I TCAACCAGAAAGCC 

I I M i I I 1 1 I I 1 'GTGTCAAAGGCAG 
25 n | | I 1 I I II I ITA AGTCCGGATTG 

1 1 1 1 1 1 I I U I I GCGACATGCTGAT 
I 1 1 I 1 1 I I I I ITA TCAGCCTGCCGC 
TTTTTTTTTTTTGTCGGTAGGGTAA 
1 H I | i | II II I GTCGGTGGGGTAA 
30 TTTTTTTTTTTTCAACTCATAAGGG 
XTTTTTTTTTTTTTCACTGCTT AAA 

mmi iiin cGCCAGTcecACc 

1 I 1 I I t 1 1 I I It CTAGTCATAAGGG 
I I 1 1 1 1 I 1 1 I I I CACTGATTTGACG 
35 | 1 1 1 1 1 I 1 1 I I 1 GGCCACACAGGGA 
m i lUHUIIM CCCCCATTGT 

min i mi i tgaccagaaaggg 

H I IUIIlll IA CACTGGGGGATA 

III I I 1 I U M I r CAGCCGCCTTCG 
40 T l H I I 1 I I H I GTGGCCAGCTCGT 

nillUIN r TCTCATATGAATTG 
n i l U III 1 1 1 ' T G TAAAGGG AGCG 

I I I H I I I 1 I 1 1 CGTAAAGGGAGCG 
rTTTTTTTTTTTGGCGGCTCCCTCC 

45 1 1 1 1 1 I I 1 1 H r CAGATGTTCCTCC 
TTTTTTTT TTTTGTCTCACGACACG 
TTTTTTTTTTTTTCAGCCGCCTACG 
U 1 I I I I I 1 HI 1 1 GTGCTAATACC 
TTTTTTTTTTTTCTTG G AACTGCAT 

50 TTTTTTTTTTTTAGTACTCACCCGT 
TTTTTTTTTTTTATTGCTCCATCAG 

I I I 1 1 I H I I I I I GATCCTGAGCCA 

I I I I M I I I I 1 1 AGC AAGT AGAACG 

II I 1 1 I I 1 1 II I I 'GCAAGTAGAACG 
55 1 1 1 I M I I I I I I GATAACCGCAAGG 

1 1 1 H 1 1 I I II rGCAAGCGTTTTCC 

I I I I I I I 1 1 1 1 f GAATACCTCCTTT 
T 1 1 I I I I I I I I I ACAGAGCTTTACA 

I I I I III I I I I r TGTCCTTCGGGAG 
60 I 1 1 I I I I I I I rTAGGCGGCTTGCTG 
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Heterozvqotes 

From CFT if available, otherwise greedy algorithm. 

5 DPA1 



25 



I I I I I II t I I I IGCCCAGGGCACAG 

I I 1 1 I I I I I I I I CTGTTGTTCTATG 

I I I I I I I I I I I I AAGGAAAAGGCTC 
1 0 I I I I I I I I I I I I ATGAAGATGAGCA 

15 ) I | 11 1 I I I H I CACCCTCAGTGAC 

I I H I IU I U IGTCAACTTATGCC 
I I I I I 1 I I I I I I GCAGGAAGAGGCT 
I I 1 I I I I I I I I II I I GTACAGACGC 
1 5 1 I I II I I I I I I I CGGTCTCCTTCTT 
1 1 1 I I I I I H I I GCAATGGGGAGCC 
1 1 I I I I I I I I 1 I I GGATCTGGATAA 
20 HI I I H I I I I I I GATGAAGATGAG 

1 1 1 11 II i I II I I GTTTGTACAGAC 
20 I I I I I I I t I I I I CGTTTGTACAGAC 
II 1 1 I I I I I I I I CTCAGGCCGCCAA 
1 1 1 11 1 H I I I I CTCAGGCCACCAA 
I 1 1 I I I I I I I I I ATGTGG ATCTGGA 
TACACTCAGGCCGC 
"CACACTCAGGCCG 

I I I I I I I I H 1 1 I CAGGCCACCAAC 
1 I I I I 1 I 1 1 I 1 1 CGTCTGTACAAAC 

I 1 1 I I I I 1 1 I 1 1 A G AACATCTCATC 
I 1 1 I I I III I 1 1 AG AACTGCTCATC 
30 1 1 1 1 I I 1 1 1 I 1 1 1 1 GAATTTG ATG A 
30 I 1 1 1 1 ! 1 1 1 I 1 1 1 1 GAGTTTGATGA 

OPB1 

35 I I I I I I 1 1 1 1 1 1 C AACCGG G AG GAG 

I I ! 1 1 I I I I U I CAACCTGGAGGAG 

I I I I 1 I I I I 1 1 1 CATCCTGGAGGAG 
35 1 1 M I I I I I I I I I GCTGGGGGGTCA 

I I I I I I I I I 1 I I GGCCTGACGAGGA 

40 rrr, tttttttaactacgagctgg 

1 1 1 I I M i I I 1 1 I CCAGAGAATTAC 
I I I I I I H H I I I GCCGTAACTGGT 

i 1 1 m n 1 1 1 1 i ccagtactcctc 

I 1 1 I I I 1 1 I 1 1 1 AGTGCCGGACAGG 
40 45 H I II I M III l ACCCCCCAGCAGG 

I i I I I I 1 1 I I I I AGAGAATTACGTG 
I I 1 1 I I 1 1 I 1 1 I I CCAGTACTCCGC 
I I 1 1 I 1 I I I I I I GCATTCCTGCCGT 
I I 1 1 I I 1 1 I I I I CGGGAGGAGCTCG 
50 I I 1 1 I 1 1 1 I I I I CAGCCAGAAGGAC 
I 11 1 1 I I I I I I l ATTGCCGGACAGfa 
I I M I I 1 1 I M I CTGCAGCGCCGAG 
I I 1 1 I I I I I I I I GCGCGTACTCCTC 
TTTACAGAATTACCTT 
AAGTGTACCAG 
I I 1 1 I I 1 1 I 1 1 I ATCCTGGAGGAGA 
I I I I I I II I 1 1 I GGTCATGGGCCCG 
50 I I I I I I I I I I I I GGGAGGAGTACGC 

I I 1 1 1 1 1 1 I 1 1 1 I GGGGCGGCICTGA 
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1 | | I I I I I I I I I AAAAGGTAATTCT 
I 11 1 M I I I I I I CTGCCGTAACTGG 
IlilllllllHI I GTGTCTGCATA 
I I 1 1 I I I I I 1 1 I GGCTGTTCCAGTA 
10 5 I I I I I I I I I I I I GTCCCTGGTACAC 

T I 1 1 I I I I I I I I CCTGCAGCGCCGA 
I 1 1 1 I 1 1 I I I M I CTTGGAGGGGGA 
I 1 1 1 I I I I I I I IGAGGTCCTTCTGG 

I I I I ! I I I l I I I CAACCGGCAGGAG 
1 0 1 1 I I I I I I I I M I GTGTCTGCATAC 

f( - 1 1 1 I I I I I I I I I CGGGAGC.AGTTCG 

10 II I I I I M II I I IGACCCTGC-AGCG 

1 1 1 I I I 1 I I 1 1 1 CAGAGAATTACCT 

I I I I I I I I I 1 1 I I GGGTAGAAATCC 
! 5 I f I I I 1 I I I I I I I IACGTGCACCAG 

I I I I I I I I I I I I CGCTGCAGGGTCA 
1 1 I I I I I I H I I AGCCAGAAGG ACA 
20 I I I I I i I I I I I I GTTCCAGTAGTCC 

I I 1 1 1 1 1 I I I I I GGCCTGCTGCGGA 
20 I I 1 1 1 I I U I I I I GCAGCGCCGAGG 
I I H I II I I I I I A CTACGAGCTGGT 

I U 1 I I I I I I I I CTGGGGCGGCCTG 

I I 1 1 I H I I I I I A CAGCGACGTGGG 
I N I 1 1 I I I I I I I GCCGGACAGGAT 

25 25 1 1 1 1 I j I I I I I ICTGCCGTCCCTGG 

11 11 I I 11 I M ICATGGGCCCGACC 
1 1 1 I 1 1 I I I I I I GTCCCATTAAACG 
II I II I I II I I I GTAACTGGTACAC 

I I 1 1 1 1 I I 1 1 I 1 A AGGACCTCCTGG 
30 II I I 1 1 I I I I I I CTCCTGGAGGAGA 

U 1 1 H I I 1 I I 1 G AGAATTACGTGT 
30 ! I 1 1 M H H I I CCTGATGAGGTGT 

I I 1 1 1 1 I I 1 1 1 I CACAGGAGGAGCA 
1 I I I I I I I I I I I I GCCGTCCCTGGT 

35 1 I I I 1 1 I I I I I I GGGAGGAGTTCGC 
H 1 1 II I H I I I I GGACAGGAGGAA 
I 1 1 1 I I I I I I I IACCCTGCAGCGTC 
35 1 1 1 1 1 1 1 1 1 I I I CCGCCCGGAACTC 

I 1 1 I 1 I I I I 1 I I GCTGCAGGGTCAC 

40 1 I I I 1 1 I I I I I I ACAGGACTATCCA 

I I 1 1 1 1 I I I I I IGCGTACTCCTGCC 
U 1 1 I II 1 1 I I I CCGTAACTGGTGC 
H I 1 1 1 I I I I I I GCAGGAATGCTAC 
TTTTTTTTTTTTCC AC ~ " AGCATTC 

40 45 I I 1 1 I I I I I 1 1 1 AACCQGGAGGAG 

I I I I I I I I I I I I 1 GGCCTC.AGGCGGA 
I I I , I I I I II I I A CTACGAGCTGGG 

1 I 1 1 I I I I I I I I ATGAGGTGTACTG 
I I I I I I I I 1 I I I A TACATCTACAAC 
50 TTTTTTTTTTTTTAACTGGTACACT 

I 1 1 I H I I 1 I I f CACGTAATTCTCT 
45 1 1 I t . I I I I I 1 I AGCATTCCTGCCG 

II I I H I I II I I A CTGGTACACTTA 

I I 1 1 I I I I I I I I GGCAATGCCCGCT 
55 I I 1 1 1 1 I I I I I I GCTTCGTGCTGGG 

II 1 1 I I I 1 1 I I r CGCCCGGAACTCT 
I 1 1 I I I I 1 1 I I I ACAGGACTGTCCA 

„ II I I I I I 1 1 I I I rCCTCCAGGAGGT 

50 I 1 1 1 I M 1 1 I I I CCTTCTGGCTGTT 

60 1 1 1 I 1 I I I I I I I GTTCCAGTACTCC 
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I II 1I I HI1H GCGCTGCAGGGTC 
1 I 1 I II I 1 1 I I I A ACCTGGAGGAGA 

I 1 1 M I I M U 1 1 I C CTGCCGTAAC 

II I I I I I I I I I I ACGCTGCAGGGTC 
5 MUnil llllCCACAGAATTACC 

1 1 I I I I I I I 1 I I CCAGAGAATTACG 
1 1 I I I I 1 1 I I I ICGCCGAGTCCAGC 
1 1 1 1 I I 1 1 I 1 1 1 A ACAGGCAGGAGT 
1 I I 1 I I I I I I I I I CCTCGAGGATGT 
10 I I I I I I H 1 I I i AACCGGCAGGAGT 
I H III I i l l I 1 CTCCAGAGAATTA 

I I M I I I I I I I I GTTCCAGTACACC 

II U I 1 I i I I I I CTCCTGTAGGAGA 
II M I I m I I I UACCTTTTCCAG 

1 5 H i I I I I I I I I I GGAGGAGTTCGTG 
H I 1 H I 1 I I I I GAGGAGCTCGTGC 
1 1 I II I I I I I I I GCCGTAACTGGTG 

I I I M I I I I I I I GCCCGCTCCTCCT 

I I H I I Ml I I 1 CGTCCCTGGAAAA 
20 M I I 1 1 I I 1 I I 1 GCCGTCCCTGGAA 

1 1 1 1 1 I I I I I I I CCCCTCCAAGAAG 
I I I I I 1 I I I I I IGCTGCCTGGGTAG 
TTTTTTTTTTTTTCCAGTAGTCCTC 
I I 1 1 I I 1 1 I I I I ATTCCTGCCGTAA 
25 II I I I I I I I I I I CCTGGAAAAGGTA 
I I I I I I I I I I I I CGTCCCTGGTACA 



I I 1 1 I H I II I I ATCTCCCTGCTGG 
30 1 1 I I I II 1 1 1 1 I GAAGGAGAACCTG 
1 I I I 1 1 I I I I I I CGTGCACCAGTTA 
I II I 1 1 I II I I ICGGACAGGGTATG 
1 1 1 I 1 1 1 1 I I 1 1 CGGACAGGATATG 
I 1 1 I 1 1 I 1 1 I I I GCACTCGGCGCTG 
35 I 1 1 I 1 1 I 1 1 I 1 1 ACACGTAATTCTC 
1 1 1 1 1 1 I 1 1 I 1 1 CGTAACTGGTACA 
1 1 1 I I I 1 I I I I I AATGACCCCCCAG 
1 1 1 H I1 1 I M I T CTCTCCAGGAAG 
I 1 1 I 1 1 1 1 1 1 1 1 CAGCGACGTGGGA 
40 I I ) I I I 1 1 I I 1 1 ICCTGCCGGTTGT 
I 1 1 I I I I II I 1 1 GAAGGACATCCTG 
T 1 1 I I I H I 1 1 1 GAAGGACCTCCTG 
1 1 1 M I 1 1 M 1 1 I GTTCCAGTACAC 

I I I I I I I 1 1 I I ICAGAAGGACAACC 
45 I 1 1 1 1 I 1 1 1 1 1 1 GCCTGATGAGGTG 

DQA1 

I I I I 1 1 1 1 1 I I I CACAAGAGGCAAC 
50 I I I I I I I I I I I I CATAAGAGGCAAC 

I I I I II I I I I 1 1 GAACAC.AGGCAAC 
H II II I I ! ! 1 1 ACATCCTCATCTG 
I 1 1 I I I 1 I I I I I GAGTGCCCATTGC 
I I I I 1 1 1 1 II 1 1 C AG C C ACAATGTC 

55 1 I I IHI I H 1 1 ACAATCCCAGGGC 
1 11 I 1 1 1 1 I I 1 1 ACAACCCCAGGGC 
1 I I I 1 1 1 1 1 I 1 1 GTGGGCATTGTGG 
I I I I 1 1 III I 1 1 ATGGGCATTGTGG 
HI I I I 1 1 1 I 1 1 CCAACACCCTCAT 

60 I I I I I I II I II I AGACTGTGGTCTG 
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1 1 n I [ IH I 1 1 CCAACATCCTCAT 
I I I I I IN I II I GGCCCACAGACAA 
I I I I I I I I I I I I CATGGGCATTGTG 

I I I I I I I I I I I I AACATCCTCATCT 
5 1 1 I I I I 1 1 M I ICAACACCCTCATT 

I I I I I i I I i I C rGACTGTGGTCTGC 
1 1 I I I I I I 1 1 I I AGCACTGGGGACT 
1 1 I I I I 1 1 1 1 I ' I CTT AGATTTGACC 

1 1 I I I I I I 1 1 I I 1 1 I AGATTTG ACC 
10 I I I 1 1 I I I I I I I CGATGTTCAAGTT 
1 M 1 I I I I 1 1 I I CAATCCCAGGGCG 
1 I I I I I I I I I I I CCTCGGATGATGA 
1 1 1 1 I I I I II I I ICCACATAGAACT 
I 1 1 1 I I I I 1 1 I I AAATTCATGGGTG 
15 1 1 I I I I I I 1 1 I ICAGCCACAATGCC 
M | | I I I II II I CACCATAAGAGGC 



1 1 I 1 ! H I I II I A ACTCTCCTCAG 
I I 111 I I I I I I I 1 AAATCTCATGAG 
20 1 1 1 1 1 1 I I 1 1 I I CTCCTCCCTTCTG 
ll lll l ini l T GTCAGCCACAATG 
TTTTT i I I I I I I I CATTCCTTCTTC 
1 1 1 1 1 1 I I I I I I CTTCCTCCCTTCT 



25 II J I I I I I I I I I GAGGCTCATCCAG 

I I I I 1 1 1 1 1 1 I r CAGGCTTGTCCAG 
1 1 1 I I I I 1 1 1 I I A TGTTGACCACAG 

1 1 1 I 1 1 I 1 1 1 I IAGTGCCCACCACA 

I I I I 1 1 I I I I 1 1 GAACATCCTGATT 
30 I I I I 1 1 I 1 1 I I I GGACCTGGAGAAG 

I I M 1 1 I I 1 1 I I CCCTCTGGCCAGT 

I I I 1 1 1 I 1 1 I I I CCCTCTGGGR-AGT 
11 1 I I U I I I 1 1 I I A CACCGTAAGA 

I I | I 1 1 I 1 1 1 1 1 A GAAGATTTGACC 
35 1 I I I I I I I I I 1 1 GAACTGGCCAGAG 

11 1 1 1 1 1 1 1 1 1 1 GCTACAACTCTAC 
II I I I 1 I 1 1 I 1 1 CAGTCTTACGGTC 
II I 1 I I 1 1 1 1 1 1 CAGTCTTATGGTC 

40 DQB1 

| M 1 1 I 1 1 1 I 1 1 ATCTTGCAGAGGA 
l ll l ll III 1 1 1 GGCTGGGGTGCTC 
I I I I I I I 1 1 I I I GGGTCACCGCCCG 

45 1 1 II 1 1 1 1 1 I 1 1 CTGGGGCCGCCTG 
M ill llllll I CTCGGCGCTAGGC 
I I I 1 1 I 1 1 1 I 1 1 GTATCTGGTCACA 
I I I 1 1 I I 1 1 I I I AACTACGAGGTGG 
I I I N I I 1 1 I I I CCAGTACTCGGCG 

50 1 1 I 1 1 I U I I I I CGGTTATAGATGT 
TTTT* 7 , i I I I I GCAAGTCCTGGAG 



I I 1 1 1 I I I I I I I GGCCTTAAACTGG 
55 M 1 1 1 I M M M I GTGTCTCCATAC 

M 1 1 I I I 1 1 1 1 I GTCGGAAAGGGCT 

I I I [ I I 1 1 1 I I I GGGTGTATCGGGT 
I I I 1 1 I I 1 1 I I I CCAGTACTCGGCA 
I1 HMIH II I GTAGACATCTCCA 

60 I I I 1 M I HI I I AGGAAACGGGCGG 



'CCTCCCTTCTG 



fATAACTCTCCTCA 
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TTTTTTTTTTTTCACACCCCGCACG 
I I I I I I I 1 I I I I I CCGCTCGGGTGC 
I I 1 1 I I I I I 1 I IAGCATCACCAGGA 
H 1 1 1 I I t I I I I CCAGTTTAAGGGC 
10 5 I I I I I I I I I I I I ATAGCCACAAGGA 

M I I I I I I I I I I GTATGCAGACACA 
TTTTTTTTTTTTTCCAGTACTCGGC 
I I I I II I I HI I AGCGCACGATCTC 
1 I I I I 1 I II I i I G G AC ATCCTGG AG 
10 ! I I I I I It I I I I I GGGGCTGCCTGA 
I H || I I I I I I I GTCAGAAAGGGCT 
*5 m i m i i TTI CAGGAGCCCTTTC 

1 I I I I 1 I I I I 1 I TG TCTCTT CCTG G 
M I I I I I 1 I 1 I I ACACCCCGCACGC 
1 5 II 1 I I I 1 1 I 1 1 I T GGTTTCGGAATG 
M M I I II II 1 I AACGGGACAGAGC 
] ] I i I 1 1 1 1 I 1 1 GCTGGGGCCGCCT 
on II I I I I I 1 1 I T TGAGGATTTCGTGT 

113 I 1 1 I I I I I I I I 1 GAGAGGAGTACGC 

20 H I I 1 I 1 1 I t 1 1 CACATCAAAGTCC 
1 1 1 1 I I I I I I M GCCAGGAGGAGAC 
] 1 1 1 I I I 1 1 i I I GTACTCGGCGGCA 

III 1 1 I I I 1 I 1 1 CGCCAGTTGTCTC 
T 1 1 I I I I 1 1 I I I AGGGGGGTGGACA 
25 25 TTTTTTTTTTTT AG ATG TATCTGGT 

1 1 I II I I I I I I 1 \ GGGGGAGTTCCG 
MUnilH I H GTCTCCTCCTGG 
1 I I I I I 1 1 I II I CACACTCTGTCCA 
1 I 1 I II I I 1 1 I I GGAATGATCAGGA 
30 I IIIMIHU I A TGGGGTCGCCGC 
I 1 I I I I I I I I I I CAGATCAAAGTCC 
30 1 1 I I 1 1 1 1 I 1 1 1 AACGGGACCGAGC 

I I I I I I I I I 1 1 I AGGAGTACGTGCG 

I I I 1 I I I I I 1 1 1 ATGTGACCAGATA 
35 1 1 I 1 I I I I I 1 1 I AGGGGCGGCCTGT 

TTTTTTTTTTTTCGCCGGTTGTCTC 

I I I 1 I I I II 1 11 rGTAACCAGACAC 
1 | | | l | | II 1 1 I GTGAAGTAGCACA 

35 1 1 | 1 1 1 I 1 1 1 1 I AGCGGCGACCCCA 

40 1 I M I 1 I I I 1 1 1 CACACCCTGTCCA 
I I I I I I I I I I I I GTGTGACCAGATA 

II I Il l ll l 111 I GGACCTTCCAGA 
TTTTTTTTTTTT ATCG GGTG GTG AC 
I II H I I II I I I GTTTAAGGGCCTG 

dn 45 1 1 | I 1 1 I I ) 1 1 I TGAAGTAGCACAG 

w TTTTTTTTTTTTGCTCCAACTGGTA 

1 | I I I I 1 1 I I I I CCTTAAACTGGTA 

I I I 1 II I I I 1 1 I AGGAGGACGTGCG 

I I I M I 1 1 I I I I I CGTGCTGGGGCT 
50 | I I I I I I H I 1 I CGCTGCTGGGGCT 

1 I 1 I I I I ' 1 ' ' CCAAGGAAGATCA 
4$ 1 1 I I 1 1 1 I I I I I ACCGCGCGGTGAC 

H 1 I 1 1 1 1 I I I r GCCCTTAAACTGG 
T I I 1 I I 1 I 1 I I I I GGTCACACCCCG 

55 1 H 1 11 1 1 i M I GGGAGTTCCGGGC 
I | 1 I I I I I I 1 1 I AGGAGGAG ACAAC 
I 1 I I 1 I I I I I I I GGGTGG AC ACAAC 
T 1 1 I I I I I I I I I I CTGCTCGGTGAC 
50 T I I 1 I H I I I 1 I I GGGGCGGCTTG A 

60 I I II I I I I I 11 I 'GCGCACGTCCTCC 
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I I I I I I I II I I I AGGATTTCGTGTA 
M I III I I I I I I GCCTTAAACTGGA 

DRB345 



15 



I | | I I I II I I I I GTACCTGG ACAGA 

II I I I I I I I I I I GTTCCTGGAGAGA 

I I I I I 1 1 I 1 1 1 I ACACTCATACTTA 

I I I I I I I I I I I I ACACTCAGACTTA 
1 0 I 1 1 I I I I I I I I 1 1 CCTGGAGCAGGC 

II II HI I I II I I CGAAGCGCGCGT 
I I I I I I I I I I I I AATCTGCACAGAG 

I I 1 1 I I I I I I I I AGGGCCCGCCTGT 
M II I I I I I I I I AGG ACACTCTGG A 
15 I 1 1 1 I II I 1 1 1 I G TGTAAACCTCTC 

I I I I 1 1 1 I I I I I CTGTCGAAGCGCA 

1 1 1 1 I 1 1 I I I I IGGGGCCGGGCTGT 
20 I I I I 1 1 1 I I I I I I CTTCCAGGATGT 

1 1 1 1 I I I I I I I I AACTACGGAGTTG 
20 1 I M I 1 1 I I 1 1 I CAAGAAACATGGT 
111 111 11 1 1 11 I AACCAGGAGGAG 
1 1 1 I I 1 1 I I I I I I GAAGCTCTCCAC 
GGGGCGGCCTGTC 
TGCGGCGCGCGTGT 
25 25 I 1 1 M 1 1 1 I 1 1 1 I I IU 1 GGAGCTG 

I I 1 1 I I 1 1 I 1 1 1 1 I CTCTTCCTGGC 
I 1 1 1 1 1 1 1 I 1 1 1 AACTACGGGGTTG 
I 1 1 II 1 1 1 III I GTATCTGATCAGG 
I I I I I I 1 1 I 1 1 I GGCCAGGTGGACA 
30 ill 1 1 II I I I H GCCGCAGCTCCGT 
1 1 1 1 1 1 1 1 I 1 1 1 GGTTCCTGGAGAG 
30 I I 1 H M I I I I I GTCGAAGCGCACG 

GTGTCTGCAGTAG 
GCTCCACTTGGCA 
35 I I I M I U I 1 1 I I ACGGGGTTGGTG 
I I I I I I I I I I 1 1 CGGTTCCTGCACA 
I ] I I I I 1 1 I 1 1 1 I CCAGTACTCGGC 
I 1 1 I I I I H I 1 1 I GTCCACCTCGGC 
I I I I I I I M I 1 1 I CTTCCTGGCCGT 
40 TTTTT . I ll 1 1 1 GGTGTCCACCAGG 
I I 1 1 1 1 I I I 1 1 1 ACTCCGTAGTTGT 

I 1 1 1 1 1 1 1 II 1 1 CACTCAGACTTAC 

I I 1 1 1 1 1 1 1 I 1 1 GATGCTAGAAACA 

I 1 1 I I I I I I I 1 1 GTGGAATGGAGAG 
40 45 I 1 1 1 I 1 1 I I I I I I AACCAAGAGGAG 

TTTTTTTTTTTTGTTCCGGAATGGC 
1 I I 1 1 1 1 1 1 1 1 I 'GTATCTGCAGTAG 

I I I I 1 1 I 1 1 I I I ACCTCCTGGTCTG 

II I II I I I I H I AGCCAACAGGACT 
50 I H I 1 1 I I I II I GCGGTTCCTGCAG 

1 1 1 1 1 1 I I 1 1 I I CGCGCCGCGGTGG 
45 1 1 I 1 1 1 I 1 1 1 1 1 GTAAACCTCTCCA 

H I I 1 1 i I I I I I CTGATCAGGCTCC 
I I I I I I I I I I I I 1 CCAGGACTCGGC 

55 TTTT TTTTTTTTAACC ATTCAC AG A 
I 1 1 1 I I 1 1 1 1 1 I CGGGCCCTGGTGG 
| I I I I 1 1 I I I I I GTTCCGGAACGGC 
I I 1 1 I I I I I 1 1 I GCGGCCCGCCTGT 
50 I 1 1 II I I I I I I I I CCTGGAAGACAC 

60 I I 1 1 I I I I I 1 1 I GCCGGGTGGACAA 
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1 1 1 I I II I 11 1 I CTGCTCCAGGATG 
1 I I I I I I I I I I I CAACTACTGCAGA 
H I I 1 I I I I I I 1 GTACCTGGAGAGA 

I 1 1 I 1 I I I I 1 I I ACCTCTCCACTCC 
5 1 I U I I I 11 I I I GTGAAGCTCTCCA 

II i I I I I I 1 I I ICCGCGGCGCGCGT 
I I 1 I I I I I I I I I CTGATCAGGTTCC 

1 | | I 1 II I 1 I I I AATGGG ACGG AGC 
1 1 1 I I I I I II II IATGGAAGTATCT 
10 I I I I I I I I I 1 I I I CTGCAGTAGGTG 
I | | | | | I I I I I ICGGGCCGCGGTGG 
I 1 1 1 1 I I 1 I I I I CTGTGCAGGAACC 

I 1 I I I I I I I I I I CCAAGAGGAGGAC 

II 1 I I I I I I I I 1 CAATTACTGCAGA 
1 i 1 1 1 I I I I I I I I I CACCTACTGCAGA 

I I I I 1 I I I I I I I CTGCCTGGATAGA 
I 1 1 II 1 1 I I I I I GTAATTGTCCACC 
I I I I I II I I I 1 I CACCAGGGCCCGC 
1 1 I 1 I I I I M H I GCGGTACCTGCA 
20 T T 1 1 1 I I I I I I I CCTCCAGCAuCAC 
1 1 1 1 I I 1 1 I I I I GCGGCGCGCCTGT 

I 1 I 1 I I I I 1 1 I I CCAGGACTCGGCA 
TTTTTTTTTTTTGACACAACTACGG 

II 1 I 1 1 I I I I I I GATACAACTACGG 
25 1 I I I I I ) 1 I I I I A CTCAG ACTTAC A 

1 1 M 1 I 1 I 1 1 1 1 I GAGACTTACACA 
M I 1 1 I 1 1 1 1 I I I A CGGGGTTGTGG 
1 I I I 1 1 I I I I I I GTAGTTGTCCACC 
I I I I I I I i I I 1 I AACCAGGAGGAGT 

30 1 1 1 I I I 1 I 1 1 1 1 A ACCAAG AGGAGT 
H II 1 1 1 I I I I I I CCACAGCCCCGT 
I 1 1 I 1 1 I I I I 1 1 CAGCCAGAAGGAC 
I 1 1 I I I I I I \ I IGGAGGAGTTCCTG 
i I 1 I I I I I I I 1 1 GAACTCCTCCTGG 

35 M I I I I I I M I I AACCACTCACAGA 
TTTTTTTTTTTTG GCCGGGCTGTTC 
1 I I I I I I I I I 1 1 CTCACGAGTCCTG 
1 1 1 I I I I I I I 1 I GTCGAAGCGCAAG 
I I I I 1 1 I I I I I I CCTCCTGGTCTGT 



40 



HLA-A 



1 1 1 I 1 1 1 1 1 1 1 1 1 CAGTCTGTGAGT 
1 1 I I 1 1 I I I I I ICCGCAGGCTCTCT 
45 TTTTTTTTTTTTATGAGGTATTTCT 

I I I I 1 1 1 1 1 1 1 1 GGACATGGAGGTG 

I I I I I H I M I 1 C-AGGTAGGCTCTC 
I 1 I I 1 1 I I 1 1 I I 1 ACTCTTGGGGGC 

1 I 1 1 I I N H 1 1 GGTCGCCAGGTCC 
50 I I I I 1 1 I II I I I G GGAGCCCG CCCA 
I | | | | 1 , . | I I ICCGCTGCTCCGCC 
I I I I 1 1 I I I I I I I QAAGGCCCAGTC 
M I I I I II I I I I GCAGCCATACATC 

I I 1 1 1 I I I I H I CCACTCCACGCAC 
55 I I 1 1 1 1 M 11 I I CACGTCGCAGCCA 

II I I I I I I I I I I GGTCTGCCCGAGC 
I I 1 I 11 I I I I I I CAGGTAGACTCTC 

I I I I II 1 I I I I I GGGAGACACGGAA 
I I 1 I I I I I I I I I CCCGTCCACGCAC 
60 I I 1 1 I I I I 1 I I I GTCCACTCGGTCA 
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1 1 II I I II H I I A TCCAGAGGATGT 
1 1 1 1 M I I I II I CGCGATCCGCAGG 

I i 1 1 I I 1 I I I I I CCGGGACACGGAA 

II I I t 11 I I II I GGAGGAGGAACAG 
5 I 1 1 1 I 1 I I I I I I AAGTGAAGGCCCA 

1 1 I I I I I I I I I I GGGGCTTGGGGAG 
M l I M 1 I I I I I CAGACTAACCGAG 
H 1 1 I I I M I 1 T GTCCTGGGGGGGT 

I II I H I I I I I I CGTCGTAAGCGTC 
10 I I 1 1 U I I I I I I A GGTCCACTCGGT 

I I 1 I I I I I M I I GGTAGGCTCTGAA 

I 1 1 1 I I I I I I I I GCGCGATCCGCAG 

I I 1 I H I I I II \ GTGTCCTGGGTCT 

I I 1 1 I I I I M I I A TCCAGATAATGT 
15 m i M I IHt I CCGTCGTAGGCGT 

TTTTTTTTTTTTTCATATTCC GTG T 

I I 1 1 I I I I I 1 1 iCGGACCCCCCCCA 
I 1 | | I I I I I I I i GCCGCATGGACCG 

I I 1 1 I I I I I I I IGCTGCTCCGCCGC 
20 1 1 1 1 I I I I I I I I AGCGCAGGTCCTC 

I I 1 1 1 I 1 1 1 I I I CTACCTGGATGGC 
milllllll I GGTATTTCTTCAC 

I I 1 I I I I I I I I I ATATGAAGGCCCA 
H I 111 I I II I I CCGTGTCTCCCCG 

25 1 1 I 1 1 I I I I I I I CCGGCAGTGGAGA 

I I I I I I I M I I ICGGACGCCCCCAA 
1 1 I 1 1 I I I I I I I CCGTGAGGCGGAG 
I ) I I I I I I 1 1 I I AGGAGACAGGGAA 

I I I 1 1 I 1 1 I I I I A GAGCGAGGACGG 
30 1 1 | 1 1 1 I 1 1 1 I I GCACATGGCAGGT 

I I I 1 1 1 I 1 1 1 I IC AQCTGCTCCGCC 
1 1 II I I I 1 1 1 1 I A TGAACAGCACGC 

1 1 I 1 1 1 I 1 1 1 I I C CCGGCCCGGCAG 
1 1 I 1 1 1 I II I I I G CAGCCTGAGAGT 
35 TTTTTTTTTTTTGACGGTCATGGC 

I I I I 1 I 1 1 1 1 I 1C CGTCGTAAGCGT 

I I I I 1 1 H I N I GAGTATTGGGACC 
I I III I 1 1 I H I CTGGCCTGGTTCT 
I I I I I I I II I I I ACCTCATGGAGTG 

40 1 1 I I 1 1 1 1 I I I IAGCCGCCATGTCC 

I I I I I I I I I I I I CACGTGCCATCCA 
H II I I 1 1 II I I GGTCCCCAGGTTC 

I I I I I I I I I I I I AGGAGAAGACATA 
1 1 I I I I I I I I I I CTGCTGCTCCGCC 

45 I I Ml U I I I I I I GACCC AGACCAG 
mi l ll l lll I CGGGCGGAGCAGT 
lll l lll l ll rTAGGTTCGCTCGGT 
1 1 I I I 1 I 1 1 I I 1 CATATGCGTCCTG 
1 1 1 1 1 1 1 1 I 1 1 1 CGTCCTGGGGGGG 

50 1 1 I M 1 I 1 1 I 1 1 GCACGTGCGTGG A 

I I I I 1 1 I I I I I I GGTATTTCTACAC 

I I | | 1 1 III I i r AGGAGCAGAGATA 

m i nimi i cccgaaccctcgt 

mi ll llll l I GCCACATGGGCCG 
55 1 1 I I I I I I I I I \ AGCAGGAGGAGCC 

mniniii i atccagatqatgt 

I I I I I I I I I I 1 1 ggatggggagcac 
H I I II I I I I I i gcactggcgcttc 

I I 1 1 1 1 I I I I I i a gcttgtaaagtg 
60 1 1 1 I 1 1 I I I I I i gataatgtatggc 
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TTTTTTTTTTTTTCACACCCTCCAG 
1 1 | | I I I I I I I I CTACGTGGACAAC 
H I I 1 I I I I I I I CGAGCGAACCTGG 
T 1 II Hi I) I I I CGAGAC,AGCCTGC 

10 5 I I I I I I It I I I I GGGCTACGTGGAC 

i I I I I I I I I I I I A CCACCAGTACGC 
I I 1 I I I I I I I I I GAGGATGTATGGC 
TTTTTTTTTTTTGATCTCAGCCGCC 
Tl I I I I I HI 1 1 GATCTGAGCTGCC 
10 I I I I ) I I I I H I GATGATGTATGGC 
1 ; 1 1 1 1 I I I I I I A TACCTGGAG AAC 

15 Tl 1 I 1 I I I I M 1 GATGTATGGCTGC 

I 1 I I 1 1 I H III I CCGCAGGTTCTC 

I I I I I I 1 1 I I I IGAGCAGAGATAAA 
15 1 I H H 1 1 I I U GGGCTGGGAAGAC 

H 1 1 I I II I U I GATGGGCAGGACT 
1 | ( ] I I 1 1 I I 1 1 I CACTTTCCCTGT 
on 1 1 1 1 1 1 I I I I I T CCCACGATGTGGA 

dU I i I 1 1 I 1 I I I I I AGTCATATGCGTT 

20 1 1 11111111 1 I GGCGGACATGGCG 
1 H 1 1 1 1 1 I 1 1 I GCTCCGCCTCACG 
I I 1 I H H I I I 1 CGTCGTAAGCGTT 
I 1 I I I I I I I I I 1 GATCATGTTTGGC 
TTTTTTTTTTTTCACG GACGCCCCC 
25 25 I I 1 1 1 1 1 I 1 I I I GCTCCTCCTGCTC 

I I I I I I I I I I I I ACTCACCGAGTGG 

I I 11 1 I I I I I I I A GTCATATGTGTC 

1 1 1 1 I I I 1 U I 1 GGTCTGAGCTGCC 
1 1 1 j 1 I 1 I I 1 I 1 ICCCACTTGCGCT 
30 T ■ I I 1 1 I I 1 1 I I GCCCACTCACAGA 
I I I I 11 I I I I I 1 GGCTCAC.ATCACC 

30 m il ium i g ctcttggaccgc 

I 1 1 1 1 1 I 1 1 I 1 1 G AGAGCCTGCGGA 
H I I I I I 1 1 I I 1 GGAACACACGGAA 
3 5 111 I 1 1 I I I I I I CGGAACACACGGA 
T 1 1 I 1 1 I 1 1 I 1 1 CGTAAGCGTCCTG 

III | 1 1 I 1 1 I 1 1 GCCGGTGCGTGGA 
1 1 1 1 1 1 I I I I I I GCCGCATGGGCCG 

35 1 1 1 1 1 1 I 1 1 I I 1 CCAGAGCGAGGAC 

40 I I I I 1 1 I 1 1 1 I 1CCCAACGGGCCGC 
1 I I I I I I 1 1 1 1 1 CGAGTGCGTGGAG 
I 1 I I I I I I I I I 1 GCGAACCTGGGGA 

I I 1 1 1 1 I 11 1 1 1 CGGGTACCAGCGG 
111 1 11 i l l III I G AAGCGGGGCTC 

45 11 1 I III I I 1 1 I GGCGGCCCGTTGG 

II 1 I I I I I I I I I 1C TGGGTCAGGGC 
H I I I I I I I 1 1 I GCCTCATGQGCCG 
Tl I III 1 I I I 1 1 CCATCCCGCTGCC 
TTTTTTTTTTTTAGCTCAGACCACC 

50 1 III I I I I 1 I I I GTCGTAAGCGTCC 
H I I I I I I I I I I CCCGGCCGCGGGA 
45 n i UHHin GGTCCCAATACTC 

1 1 I 1 I I I I I I I 1 CGTCCCAATACTC 

III 1 I I I I I 1 1 I GTTCTCACACCAT 
55 I I I I I I I I I I II I C CTCTGGATGGT 

I I I I I 1 1 I I I I I I CCCACTTGTGCT 
I I I I I I I I I I I ICCTGACCCAGACC 
1 I I I I I I I I I I I IGAGAGCCCGCCC 
50 1 1 1 1 1 I 1 I I II rGAGTGCGTGGAGT 

60 II I I I I 11 I I I I 1 ' AC ATC ATCTG G A 
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I I I I 1 I HI I I I GATCCGCAGGTTC 
1 I I I t I I I I ! I I I AGAGCAGGAG AG 
I | I I I I 1 1 I I I I CCTGGCAGCGGGA 
1 I 1 1 1 1 I 1 1 I 1 1 I 'CATGGAGTGAGA 
10 5 | | H | H I I I I I CCGGCCGCGGGAA 

I H II | I II I I I CCAGGACACGGAG 
Mll l llllll [ CCGGGACACGGAG 

I I I I I I I I 11 I 1 GCAGCCACACATC 
TTTTTTTTTTTTG GATG GTGTG AG A 

1 0 11 1 1 I 1 H I I I IAACATCATCTGGA 
4 . M | | | | H 1 Mi 1 CCTCCTCCACAT 

15 TTTTTTTTTTTTTGGGCGGAGCAGT 

1 1 1 1 I 1 1 1 1 1 1 ri GCAGGGGATGGA 
1 1 I I I I 1 1 I I 1 1 CGCAGGAAGCGCC 
15 1 1 I 1 I M I I I rTGGCCGTCATGGCG 
1 1 M 1 1 I I I I I 1 ATGCGTCCTGGGG 
1 1 I 11 III I I I I ATGCGTCTTGGGG 
20 I I I I I I I I I I 1 1 I 1 1 CCCTGTCTCC 

m i imill l l CAGGGTGGCCTC 
20 I I I 11 1 1 1 I U I GAGGAGQAACAGC 
1 I Ml I I 1 1 1 I I GCGCAGGGTCGCC 
1I U I II III 1I C AGCCAAACATCC 

I 1 1 I I 1 I I I I I 1 ACTTCTGGAAGGT 

I I | l I I 1 11 I I rT CCTCTGGACGGT 
25 25 11 1 1 11 1 1 II 1 1 GGAGAAGAGATAC 

1 ! 1 1 11 1 1 1 H i A TTCCGTGTCTCC 
1 1 1 1 I 1 1 I I I 1 1 T C AATCTGTGAGT 

I 1 1 1 I 1 1 1 I 1 1 1 GGCCCGTCGGGCG 

I I 1 1 I 1 1 1 1 1 I I CGGCGGACATGGC 
30 1 1 1 1 I I I 1 1 1 I I I ACAAGCTGTGAG 

II II I I | I I I 1 I CGAACTGCGTGTC 
30 1 1 1 1 1 1 1 1 1 1 1 1 CGAGCTCCGTGTC 

M I 1 I II I 1 1 I ' TA CTCCACGCACCG 
I | I I 11 1 I I I 1 I CTACGTGGACGAC 

35 

HLA-C 

H | l Ml I It 1 I I GAGCTGGGAGCC 
J0 M 1 1 I I I 1 1 I I 1 ATCACAACAGCCA 

40 I 1 1 1 I I H 1 1 I I A GGCTCTCCGCTC 
I 1 1 1 1 1 1 1 1 1 1 1 GGAGTGGGAGCAG 

II I I I 1 1 1 1 1 1 UCACACCCTCCAG 
1 1 1 1 1 I I I 1 1 I I a CTCCACGCACAG 
Mll l llllll I GCCGTCGTAGGCG 

40 45 I 1 1 I I 1 I I 1 II I CGCGCAQAACCCC 

Mlll l lll l l I A GTAGCCGCGCAG 
M ll ll ll lll I G GAGCGGACAGCC 
1 1 1 1 1 1 1 1 I 1 1 I CAGGTAGGCTCTC 
Ml I I I M I I I I GGTTCGGGGCTCC 

50 M l l l llllll I GCCCCAAGCCCTC 
M l ll ll llil. Z 3GCATGACCAGT 
45 1 II I 1 1 I 1 11 I I GCGGCTCCGCGGC 

1 I t I I I I I I I H I CCAGTGGATGTA 
I I I I I I I 1 1 M I GGCATG ACCAGTT 

55 I I I I M I 1 1 I I 1 CTCACTCGGTCAG 

I I I I 1 1 I I I M r CAAGCCCTCCTCC 
T 1 1 1 II I M 1 1 I I AGTTTCCGC.AGG 

I I I 1 1 1 I I I I I I CAGGTCGCAGCCA 
50 1 1 I I I I I 1 1 I I 1 CACTGCGATGAAG 

60 1 1 I 1 1 1 I I I I I I GGTATGACCAGTT 
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I | I I 11 I I I I I I ACAGCCAGGCCAG 
TTTTTTTTTTTTG AGGCGG AG C AG C 
| | I I I I I I I I I I t GGTTGTAGTAGC 
I I I I I I 1 1 I I I I ACCTGCGGAAACT 

10 5 Til 1 I I 1 1 H I I CGGCCCAGGTCTC 

H i niUH I I GCTGGACGCAGCC 
1 1 I U I I I I I 1 I CAGGTTCCGCAGG 
I | | 1 1 I I 1 I I 1 I CCGCCAGGCACAG 
| | | | | | | H I I IC CTCCTACACATC 
10 TTTTTTTTTTTTACGGCGGAGCAGC 
1 1 1 I I 1 1 I I I 1 IA GCGCGCGGAACC 

•5 UI I III I HIII IC ACTCGGTCAG 

| | U | II II I I IA CGCCGCGAGTCC 
1 I 1 1 1 I I I I I I I \ GGAGCAGGA.GGG 
1 5 TTTTTTTTTTTTGGGTATGACCAGT 
I I 1 1 I I I I I 1 1 1 A TACCTGGAGAAC 
| | 1 1 1 I I U I I I GGGTTCGGGGCTC 
I 1 1 1 I I I I I M 1 GACCGCTAGGACA 

20 III I I 1 I 1 H I I ATCTGAGCCGCTG 

20 1 1 1 1 I I t 1 I I I I CGCGGAGAGCCCC 

I I I I I I I I I I I rCCTGGCGCTTGTA 

II I I 1 I 1 I I I I I CCTGCGGAAACTA 
[ I I I Ml I H I I AGCGTCTCCTTCC 

1 HI | 1 1 I I I I H- GGCGCCCCGAAC 
25 25 [ I I I I I I 1 1 I I 1 ATGATGTGAGACC 

T 1 1 1 1 1 1 I I I 1 1 CTCGGTGTCCTGG 

I II U I I II I 1 1 GTAGTAGCCGCGT 
I N | | | I I II 1 1 A GGATGTGAGACC 

I I I II I I 1 1 I 1 1 GGTAGGCTCTCTG 
30 1 1 1 1 1 1 1 1 1 1 1 1 AGCGTCTTCTTCC 

1 II II I I 1 1 1 1 I CATAGGAGGAAGA 
30 1 1 1 11 1 1 I II 1 1 G ACAACCAGGACA 

1 1 11 1 1 I 111 I I G CCGCGGGGAGCC 
1 1 1 I HI I I I I I GGTGAGGGGCTCT 
35 I Ml I I I II 1 1 1 CGAGGGGCTGCCA 
1 1 1 I I I I 1 I I 1 I GGGTATAACCAGT 
1 1 1 I I I I II H I I CCAGAATATGTA 

I I 1 11 I I I II I I G GGTGC AGGGCTC 
35 H I I 1 I I I I I I I CGCGCGGAACCCC 

40 1 1 1 I I I II I I I I I AGTAGCCGCGTA 

I I I I I I I I I I I I AGCTGCTCTCAGG 
1 1 1 1 1 1 I I 1 1 1 IA CCGCACGAACTG 
1 1 1 1 I M I I 1 1 ICCGCAGGCTCACT 
TTTTTTTTTTTTG GTG TG AG ACCCG 

45 1 1 I I I I I I I I I I I GGAGCCCCGAAC 
HU I ll i m I AGCCGCGGGAGCC 
1 I I I I I I I I I I I A CTGCACG AACTG 
llll l l 1 1 I 1 1 I CCGCACGAACTGT 
I II 1 1 1 I 1 1 1 1 | G GTGCAGGGCTCC 
50 TTTTTTTTTTTTGC AG CAGG AG C AG 

I I | 1 ) 1 | | | U I I GAGTCTCTCATC 
45 1 I I 1 11 1 1 I 1 1 1 CCGCCGTQTCCGC 

II I I II 1 1 I I I I I C CACGCACAGGC 
T 1 1 I I I I 1 I II I A CTCGGTCAGCCT 

55 T 1 1 I 1 I 1 1 I 1 1 IC ACACCATCCAGA 
'CACACCCTCCAGA 
"GCAG C AGG ATG AG 

XAGCCACCACAGC 

50 1 1 U I I I It I I I I CGTGGCTGGCCT 

60 1 U I I I 1 1 I I U I ACGGCGGAGCAG 
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TTCTCACACCATCCA 
TTTGCGGCGGAGCAG 
TTTCTGAGCCGCCGT 
TTGGCGGAGCAGCAG 
TTCCGCTGCGGACAC 
TTTATAACC AGTTC G 
TTCACATCCTCCAGA 
TTCCGTGTCCGCGGC 
TTCGTGGACGACACA 
TTCCGCTGTGTCCGC 
TTGAAGAATGGGAAG 
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CLAIMS 



5 1 . A method of identifying a set of extendible primers for use in 

the identification, typing or classification of a nucleic acid of known 
sequence having known polymorpi.sms wherein: 
i) all possible nucleotide sequences of a chosen length of the 

nucleic acid are identified and their corresponding extendible primers, 
io ii) at least one extendible primer is removed from the set 

wherein the at least one primer removed identifies a segment of the nucleic 
acid identified by at least one other primer. 

2. The method of claim 1, wherein between steps i) and ii): 
15 ia) potential extensions for each primer are identified with 

respect to each nucleotide sequence, 

ib) for each extendible primer the identified potential extensions 

are compared to determine which pairs of sequences can be discriminated 
by the primer. 

20 

3. The method of claim 1 or claim 2, wherein a matrix of primers 
and pairs of primer extensions is prepared in binary form and is subjected 
to analysis by a set covering problem (SCP) algorithm. 

25 4. The method of claim 3, wherein a greedy algorithm is used. 

5. The method of claim 3, wherein a CFT algorithm is used 

which involves a Lagrangrian relaxation heuristic. 




30 



6. The method of any one of claims 3 to 5, wherein a set of core 

primers is selected as a base for analysis by the SCP algorithm. 
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7. The method of any one of claims 3 to 6, wherein the set of 

extendible primers identified by the SCP algorithm is subjected to a 
redundancy check. 

5 8. A set of extendible primers, for use in the identification, typing 

or classification of a nucleic acid of known sequences having known 
polymorphisms, identified by the methr A . of any one of claims 1 to 7. 

9. The set of extendible primers of claim 8, in the form of an 
io array. 

10. The set of extendible primers of claim 8 or claim 9, for use in 
the identification, classification or typing of an organism, allele or gene 
selected from class 1 HLA, class 2 HLA and 16S rRNA. 

15 

1 1 . The set of extendible primers of any one of claims 8 to 10, 
wherein the primers are arrayed on a surface of a support in such a way 
that recognisable patterns are formed with different types or alleles. 

20 12. A set of extendible primers, for use in the identification, typing 

or classification of a human leucocyte antigen (HLA) gene as indicated, the 
set comprising about the number of primers indicated and being capable of 
distinguishing about the number of alleles indicated: 





HLA gene 


Number of 


Number of 




Alleles 


Primers 


Class I 


HLA-A 


91 


172 




HLA-B 


200 


<1000 




HLA-C 


47 


94 


Class II 


DPA-1 


11 


26 




DPB-1 


74 


130 




DQA-1 


17 


130 




DQB-1 


34 


84 




DRB-1 


192 


<1000 




DRB345 


35 


94 



10 



15 



35 



45 
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13. A set of extendible primers, for use in the identification, typing 

or classification of 16S rRNA, wherein set comprises about 210 primers 
and is capable of distinguishing at least about 1207 different sequences. 

5 14, The set of extendible primers of claim 12 or claim 13, wherein 

the primers have variable segments substantially as set out in appendix 1 
or appendix 2. 



15. A method of identification, typing or classification of a nucleic 

20 10 acid of known sequence having known polymorphisms, by the use of the 

set of extendible primers as claimed in any one of claims 8 to 14, which 

method comprises applying the nucleic acid or fragments thereof to the set 
25 of extendible primers under hybridisation conditions, and effecting 

template-directed chain extension of extendible primers that have formed 
is hybrids. 

30 16. The method of claim 1 5, wherein the set of extendible primers 

Is provided in the form of an array, and template-directed chain extension is 
effected using labelled chain-terminating nucleotide analogues. 



20 



17. The method of claim 16, wherein template-directed chain 

extension is effected using four different fluoresce ntly-labelled chain 
terminating nucleotide analogues, and the results are analysed by total 
40 internal reflection fluorescence or confocal microscopy. 



25 



1 8. The method of any one of claims 1 5 to 1 7, wherein the 

nucleic acid is a PCR amplimer. 



19. The method of any one of claims 1 5 to 1 8, wherein the 

30 nucleic acid is HLA Class 1 or HLA Class 2 or 16S rRNA or a PCR 
50 amplimer thereof. 
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20. The method of any one of claims 15 to 19, wherein a 

dUTP/uracil-DNA-glycosyfase system is used to break the nucleic acid into 
fragments. 

5 21 . A kit for use in the identification, typing or characterisation of 

a nucleic acid of known sequence having known polymorphisms, 
comprising the set of extendible primers as claimed in any one of claims 8 
to 14. 

10 22. The kit of claim 21 , comprising also a pair of primers for 

effecting PCR amplification of the nucleic acid. 

23. An array of sets of extendible primers as claimed in any one 
of claims 8 to 14, for the simultaneous identification typing or classification 

is of two or more different HLA genes. 

24. A computer readable storage medium having a program 
recorded thereon, wherein the program consists of instructional steps for 
identifying a set of extendible primers for use in the identification, typing or 

20 classification of a nucleic acid of known sequence having known 
polymorphisms, the steps comprising: 

i) identifying all possible nucleotide sequences of a chosen 
length of the nucleic acid and their corresponding extendible primers. 

ii) removing at least one extendible primer from the set wherein 
25 the at least one primer removed identifies a segment of the nucleic acid 

identified by at least one other primer. 
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25, Computer readable program implement consisting of 

instnjctional steps for identifying a set of extendible primers for use in the 
identification, typing or classification of a nucleic acid of known sequence 
having known polymorphisms, the steps comprising: 
5 i) identifying all possible nucleotide sequences of a chosen 

length of the nucleic acid and their corresponding extendible primers, 
jj) removing at least one extendible primer from the set wherein 

the at least one primer removed identifies a segment of the nucleic acid 
identified by at least one other primer. 



